Project Awesome project awesome

Data Engineering

Collection 8.4k stars GitHub

Databases

Data Ingestion

Workflow

Bruin 1.5k updated 22d ago

End-to-end data pipeline tool that combines ingestion, transformation (SQL + Python), and data quality in a single CLI. Connects to BigQuery, Snowflake, PostgreSQL, Redshift, and more. Includes VS Code extension with live previews.

Luigi 18.7k updated 27d ago

A Python module that helps you build complex pipelines of batch jobs.

CronQ

An application cron-like system. Used w/Luigi. Deprecated.

Airflow 44.8k updated 20d ago

A system to programmatically author, schedule, and monitor data pipelines.

Pinball 1.0k (archived)

DAG based workflow manager. Job flows are defined programmatically in Python. Support output passing between jobs.

Dagster 15.1k updated 21d ago

An open-source Python library for building data applications.

Hamilton

A lightweight library to define data transformations as a directed-acyclic graph (DAG). If you like dbt for SQL transforms, you will like Hamilton for Python processing.

Kestra 26.6k updated 22d ago

Scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

RudderStack 4.4k updated 20d ago

A warehouse-first Customer Data Platform that enables you to collect data from every application, website and SaaS platform, and then activate it in your warehouse and business tools.

PACE

An open source framework that allows you to enforce agreements on how data should be accessed, used, and transformed, regardless of the data platform (Snowflake, BigQuery, DataBricks, etc.)

Multiwoven 1.6k updated 21d ago

The open-source reverse ETL, data activation platform for modern data teams.