Project Awesome project awesome

Data Engineering

Collection 8.4k stars GitHub

Databases

Data Ingestion

Workflow

Bruin 1.5k updated 2d ago

End-to-end data pipeline tool that combines ingestion, transformation (SQL + Python), and data quality in a single CLI. Connects to BigQuery, Snowflake, PostgreSQL, Redshift, and more. Includes VS Code extension with live previews.

Luigi 18.7k updated 7d ago

A Python module that helps you build complex pipelines of batch jobs.

CronQ

An application cron-like system. Used w/Luigi. Deprecated.

Airflow 44.8k updated today

A system to programmatically author, schedule, and monitor data pipelines.

Pinball 1.0k (archived)

DAG based workflow manager. Job flows are defined programmatically in Python. Support output passing between jobs.

Dagster 15.1k updated today

An open-source Python library for building data applications.

Hamilton

A lightweight library to define data transformations as a directed-acyclic graph (DAG). If you like dbt for SQL transforms, you will like Hamilton for Python processing.

Kestra 26.6k updated 2d ago

Scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

RudderStack 4.4k updated today

A warehouse-first Customer Data Platform that enables you to collect data from every application, website and SaaS platform, and then activate it in your warehouse and business tools.

PACE

An open source framework that allows you to enforce agreements on how data should be accessed, used, and transformed, regardless of the data platform (Snowflake, BigQuery, DataBricks, etc.)

Multiwoven 1.6k updated today

The open-source reverse ETL, data activation platform for modern data teams.