Project Awesome project awesome

Streaming

Collection 3.0k stars GitHub

Streaming Engine

Apache Apex

[Java] - unified platform for big data stream and batch processing.

Apache Ballista 2.0k updated yesterday

[Rust] - distributed compute platform powered by Apache Arrow.

Apache Flink

[Java] - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.

Apache Heron (incubating) 3.7k (archived)

[Java] - a realtime, distributed, fault-tolerant stream processing engine from Twitter.

Apache Samza 839 updated 10mo ago

[Scala/Java] - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).

Apache Spark Streaming 43.0k updated yesterday

[Scala] - makes it easy to build scalable fault-tolerant streaming applications.

Apache Storm 6.7k updated 3d ago

[Clojure/Java] - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.

ArkFlow 1.3k updated 3d ago

[Rust] - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.

Arroyo 4.8k updated yesterday

[Rust] - a distributed stream processing engine. Supports SQL and Rust pipelines. Scales up to millions of events per second. Supports stateful operations like windows and joins, state checkpointing for fault-tolerance and recovery of pipelines. Uses the Timely Dataflow model.

AthenaX 1.2k (archived)

[Java] - Uber's Stream Analytics Framework used in production

Bytewax 2.0k updated 1y ago

[Python] - data parallel, distributed, stateful stream processing framework.

CocoIndex 6.6k updated yesterday

[Rust/Python] - ETL framework to build fresh index for AI, with realtime incremental updates.

Faust 6.8k updated 1y ago

[Python] - stream processing library, porting the ideas from Kafka Streams to Python

Gearpump 758 updated 4y ago

[Scala] - lightweight real-time distributed streaming engine built on Akka.

Hazelcast Jet

[Java] - A general purpose distributed data processing engine, built on top of Hazelcast.

hailstorm 93 updated 11y ago

[Haskell] - distributed stream processing with exactly-once semantics based on Storm.

Maki Nage 42 updated 3y ago

[Python] - A stream processing framework for data scientists, based on Kafka and ReactiveX.

mantis 1.5k updated today

[Java] - Netflix's platform to build an ecosystem of realtime stream processing applications

mupd8(muppet) 128 (archived)

[Scala/Java] - mapReduce-style framework for processing fast/streaming data.

NebulaStream 77 updated yesterday

[C++] - High-performance, general-purpose, end-to-end data-management system for cloud-edge-sensor environments.

Numaflow 2.4k updated 2d ago

[Java/Python/Go/Rust] - Kubernetes native stream processing platform with language agnostic framework. Scalable and cost-efficient

Onyx 2.0k (archived)

[Clojure] - Distributed, masterless, high performance, fault tolerant data processing.

Pathway 62.5k updated yesterday

[Python] - The fastest data processing engine supporting unified workflows for batch, streaming data, and LLM applications.

s4 43 (archived)

[Java] - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

SABER

[Java/C] - Window-Based Hybrid CPU/GPU Stream Processing Engine.

Scramjet Cloud Platform 69 updated 1y ago

[Python/JavaScript/Node.js] - data processing engine for running multiple data processing apps (sequences) written in Python, JavaScript or TypeScript

SPQR

[Java] - dynamic framework for processing high volumn data streams through pipelines.

tigon 285 (archived)

[C++/Java] - high throughput real-time streaming processing framework built on Hadoop and HBase.

Teknek 9 updated 10y ago

[Java] - Simple elegant stream processing with interactive prototying shell SOL (Stream Operator Language) Mesos, designed for high performance data processing jobs that require flexibility & control.

Trill 1.3k updated 2y ago

[.NET/C#] - Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research.

Wallaroo 1.5k updated 5y ago

[Python] - A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.

LightSaber 73 updated 4y ago

[C++] - Multi-core Window-Based Stream Processing Engine. LightSaber uses code generation for efficient window aggregation.

HStreamDB 725 updated 1y ago

[Haskell] - The streaming database built for IoT data storage and real-time processing.

Kuiper 1.7k updated 2d ago

[Golang] - An edge lightweight IoT data analytics/streaming software implemented by Golang, and it can be run at all kinds of resource-constrained edge devices.

RisingWave 8.9k updated yesterday

[Rust] - A PostgreSQL-compatible streaming database that is designed to build event-driven applications, real-time ETL pipelines, continuous analytics services, and feature stores for AI applications. It excels in extracting fresh and consistent insights from real-time event streams, database CDC, and time series data within sub-seconds. It unifies streaming and batch processing, enabling users to ingest, join, and analyze both live and historical data at a cloud scale.

Data Pipeline

Apache Kafka 32.2k updated yesterday

[Scala/Java] - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.

Apache Pulsar 15.2k updated 2d ago

[Java] - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.

Apache RocketMQ 22.4k updated yesterday

[Java] - distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

AutoMQ 9.6k updated 2d ago

[Scala/Java] - cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 100% Kafka compatible. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.

brooklin 959 updated 8d ago

[Java] - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).

Bruin 1.5k updated 2d ago

[Go] - End-to-end data pipeline tool combining ingestion from 50+ sources, SQL/Python transformations, and built-in data quality checks in a single CLI.

camus 882 (archived)

[Java] - Linkedin's Kafka -> HDFS pipeline.

databus 3.7k updated 2y ago

[Java] - Linkedin's source-agnostic distributed change data capture system.

flume 2.6k updated 1y ago

[Java] - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

fluvio 5.2k updated 5d ago

[Rust/WASM] - Real-time programmable data streaming platform with in-line computation capabilities.

ingestr 3.4k updated 2d ago

[Python] - CLI tool to copy data between any source and destination with a single command. Supports 50+ connectors including databases, SaaS apps, and data warehouses.

Gazette 785 updated yesterday

[golang] - Distributed streaming infrastructure built on cloud storage which makes it easy to mix and match batch and streaming paradigms.

metaq 1.3k updated 6y ago

[Java] - Taobao's high available, high performance distributed messaging system

NATS streaming 2.5k (archived)

[Go] - fast disk-backed messaging solution

nsq 25.9k updated 8mo ago

[Go] - realtime distributed messaging platform designed to operate at scale, handling billions of messages per day.

Redpanda 11.9k updated yesterday

[C++] - Redpanda is Kafka compatible, ZooKeeper-free, JVM-free and source available.

RudderStack 4.4k updated 2d ago

[Go] - an open source customer data infrastructure (segment, mparticle alternative).

suro 797 (archived)

[Java] - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.

Streaming Library

Streamiz 530 updated 13d ago

[C#] - a .Net Stream Processing Library for Apache Kafka

Daggy 160 updated 4mo ago

[C++] - real-time streams aggregation and catching.

Benthos 8.6k updated yesterday

[Go] - Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads

FS2(prev. 'Scalaz-Stream') 2.4k updated 8d ago

[Scala] - Compositional, streaming I/O library for Scala.

FastStream 5.1k updated 2d ago

[Python] - powerful and easy-to-use Python library simplifying the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation generation automatically. Supports multiple protocols such as Apache Kafka, RabbitMQ and alike.

monix 1.9k updated 1mo ago

[Scala] - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.

Quix Streams 1.5k updated 2d ago

[Python] - a streaming library originally designed for the McLaren Formula 1 racing team that can process high volumes of time-series data with up to nanosecond precision using Apache Kafka as a message broker.

Scramjet Node.js 40 updated 3y ago

[Node.js] functional reactive stream programming framework written on top of Node.js object streams + the legacy Scramjet.js version

Scramjet Python 35 updated 2y ago

[Python] functional reactive stream programming framework written from scratch operating on object, string and buffer streams.

Scramjet C++ 3 updated 3y ago

[C++] functional reactive stream programming framework written on top of Node.js object streams.

Streamline 166 (archived)

[Java] - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus on business logic.

StreamAlert 2.9k updated 2y ago

[Python] - Airbnb's Real-time Data Analysis and Alerting.

Swave 172 updated 7y ago

[Scala] - A lightweight Reactive Streams Infrastructure Toolkit for Scala.

Streamz 1.3k updated 29d ago

[Python] - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.

Stream Ops

[Java] - A fully embeddable data streaming engine and stream processing API for Java.

Substation

[Go] - Substation is a cloud native data pipeline and transformation toolkit written in Go.

Tributary 460 updated 22d ago

[Python] - A python library for constructing dataflow graphs. Supports synchronous, reactive data streams built using python generators that mimic complex event processors, as well as lazily-evaluated acyclic graphs and functional currying streams.

YoMo 1.9k updated yesterday

[Go] - An open source Streaming Serverless Framework for building Low-latency Geo-distributed system. YoMo Built atop QUIC Transport Protocol and Functional Reactive Programming interface.

Mediapipe 34.3k updated yesterday

Cross-platform, customizable ML solutions for live and streaming media.