By: Dwolla,

If APIs are the manifestation of products and services that enable innovation and collaboration, open source is the foundation upon which APIs are built.


Open source software development is arguably one of the greatest examples of human collaboration. Since the release of the GNU project in 1983 and Linux in 1991, to today powering billions of mobile devices, open source powers the software around us. From the programming languages we use to how we run software, the Dwolla API has immensely benefited from open source.

In addition to benefiting from open source software, we are excited to “pay it forward” and contribute back to the open source community. Open source software encapsulates shared experiences and problem solving. We not only want to share what we have learned, but also learn from the community’s feedback and contributions.

Processing event streams

One area we are contributing to is data and analytics. Events are the atomic building blocks of data at Dwolla. We use some great open source tools to process, analyze, and visualize this data. However, we needed a way to query all this data interactively, using the flexibility and scale of Amazon Web Services.

We are excited to release Arbalest, a Python data pipeline orchestration library for Amazon S3 and Amazon Redshift. It automates data import into Redshift and makes data query-able at scale in AWS.

Arbalest is the backbone of our data pipeline architecture and has enabled Dwolla to query and analyze billions of events. In a few lines of code it takes care of:


Arbalest is a lightweight library designed to be composed with existing data tools. It is also written in Python, which is arguably a de facto programming language for data science. Arbalest embraces configuration as code. Unlike sometimes unwieldy configuration files, an Arbalest pipeline is only a few lines of code that can be tested, packaged, and reused. Finally, it automates the complicated and potentially brittle process of ETL (extract, transform, and load).

We hope Arbalest enables data analysts and developers to spend less time managing their data and more time answering questions.

Use cases

Why use Arbalest? Arbalest is not a MapReduce framework, but rather designed to make Amazon Redshift (and all its strengths) easy to use with typical data workflows and tools. Here are a few examples:

Getting Started

Arablest is not supposed to replace existing data tools, but work with them. In our initial release, we have included some batteries, for example, strategies for ingesting time series or sparse data and support for integration with an existing pipeline topologies. We have described a few use cases, but are excited to see more applications of Arbalest in the community.

For more information on Arbalest, be sure to install the package, check it out and star it on GitHub, read the documentation, or look at our presentation from Tableau Conference 15.

Interested in giving Dwolla’s API a try? Visit the test environment now!

fred galoso

This blog post shares insights from Fredrick Galoso, a software developer and technical lead for the data and analytics team here at Dwolla. Fred has led the creation of our data platform, including its data pipeline, predictive analytics, and business intelligence platform. In his free time he contributes to a number of open source software projects, including Gauss, a statistics and data analytics library.

View Arbalest on GitHub now

Financial institutions play an important role in the Dwolla network.

Dwolla, Inc. is an agent of Veridian Credit Union and Compass Bank and all funds associated with your account in the Dwolla network are held in pooled accounts at Veridian Credit Union and Compass Bank. These funds are not eligible for individual insurance, including FDIC insurance and may not be eligible for share insurance by the National Credit Union Share Insurance Fund. Dwolla, Inc. is the operator of a software platform that communicates user instructions for funds transfers to Veridian Credit Union and Compass Bank.