Have you ever felt like you are always writing the same connectors to the same data systems over and over again? I often find myself writing a new way to connect to snowflake or AWS s3 every other project. It becomes annoying. Writing new connectors feels redundant. Copying existing connectors leads to a poor fit and technical debt. Third party services are not always easy or extensible to use. It was these thoughts that lead me to brainstorm the idea for my newest project.
Vision
I am calling this project tippet. Tippet gives developers the freedom to move data from one system to another with ease and speed. Tippet will connect "Sources" with "Sinks" allowing developers to specify what data and from where they want to move their data to. I will also focus on transforming between different data types (DB rows to parquet for example). Although there are already a lot of great ways to move data around in python, I felt like none of them really balance speed, developer friendliness, and extensibility. Project tippet will be built from the ground up with these three core ideas in mind and in balance (as best i can).
Three Core Tenets - Speed - Ease of Use - extensibility
These three core tenets lead me to want to write project tippet in golang. All of this will be backed by apache arrow to allow for zero copy data transfer between the golang base and eventual python bindings.
What tippet is Not
Tippet is not a full ETL tool. Nor will it lock you into some kind of ecosystem. How you run tippet and how you extend it are up to you and your team.
Why Not Existing Tools?
Tools like Spark, meltano, and DLThub are great options. They have their strengths, weaknesses and learning curves. Project tippet hopes to stand out from these tools with it's concurrency and the three core tenets. My goal is to have blazing fast speed for large datasets, while maintaining a clean and simple codebase that can be understood and extended by anyone.
Why Go?
I decided to go with Go because of it's concurrency, and performance. Go is know for being readable and easy to write. Go is also highly performant. Speed and concurrency go and in hand with Go. Last Go will compile down to a lightweight binary that can run cross platform. Go already has libraries for creating bindings to python. This will allow python devs to make use of my new tool.
I also want to start this project to learn a new language. Golang has been on my radar for a long time. I have spent time here and there with other languages, including: rust, scala and others. However, I never really finished any significant projects with them. I hope that having a vision in my mind will help me to stick with my current goals.
Why Not Python/Rust/etc?
I am already pretty comfortable with python. I use it every day for work and it does a great job. However, I want to spend the time with a new language that can really give python a run for it's money performance wise. Concurrency and and performance in python can be limiting.
I thought about writing this project in Rust as well. Rust has a lot of benefits and projects like polars have shown just how powerful it is in the data space. In the end, Go has a very clean and simple concurrency model that gets the job done. Go is also know to be easy to learn, read, and write.
Project Timeline
My first goal is to get acquainted with Go while I start the basics of the app. I will then setup interfaces for Sources and Sinks. Next I will work on setting up a Postgres Source and a parquet sink. This will give me a lot of insight into how the project will look in Go. I will also at this time introduce apache arrow into my library. I will also consider how to setup the command line commands to call the app and pass in connection details. I will then introduce goroutines to work on parallelism.
My MVP will be a cli command that connects to postgres extracts data in parallel through arrow into a parquet file.
After this I will start looking into more connectors and python bindings. I Also plan to include tests all along the way.
Conclusion
Join me in the journey into Go and tippet. Keep an eye on my blog for new entries. Or follow along my github project to see what is in active development.