Project Awesome project awesome

Big Data

Collection 14.3k stars GitHub

Key-value Data Model

Bolt 14.6k (archived)

an embedded key-value database for Go.

BTDB 140 updated yesterday

Key Value Database in .Net with Object DB Layer, RPC, dynamic IL and much more

BuntDB 4.8k updated 1y ago

a fast, embeddable, in-memory key/value database for Go with custom indexing and geospatial support.

Edis 554 updated 10y ago

is a protocol-compatible Server replacement for Redis.

ElephantDB 558 updated 11y ago

Distributed database specialized in exporting data from Hadoop.

GhostDB

a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.

Graviton 423 updated 4y ago

a simple, fast, versioned, authenticated, embeddable key-value store database in pure Go(lang).

GridDB 2.5k updated 6d ago

suitable for sensor data stored in a timeseries.

HyperDex 1.4k updated 1y ago

a scalable, next generation key-value and document store with a wide array of features, including consistency, fault tolerance and high performance.

LinkedIn Krati 26 updated 13y ago

is a simple persistent data store with very low latency and high throughput.

Riak 4.0k updated 1y ago

a decentralized datastore.

Storehaus 464 updated 5y ago

library to work with asynchronous key value stores, by Twitter.

SummitDB 1.4k (archived)

an in-memory, NoSQL key/value database, with disk persistence and using the Raft consensus algorithm.

Tarantool 3.6k updated yesterday

an efficient NoSQL database and a Lua application server.

TiKV

a distributed key-value database powered by Rust and inspired by Google Spanner and HBase.

Tile38 9.6k updated 6d ago

a geolocation data store, spatial index, and realtime geofence, supporting a variety of object types including latitude/longitude points, bounding boxes, XYZ tiles, Geohashes, and GeoJSON

TreodeDB 175 updated 10y ago

key-value store that's replicated and sharded and provides atomic multirow writes.

Time-Series Databases

Machine Learning

brain

Neural networks in JavaScript.

Oryx

Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning.

convnetjs 11.1k updated 3y ago

Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

DataVec

A vectorization and data preprocessing library for deep learning in Java and Scala. Part of the Deeplearning4j ecosystem.

Decider 383 updated 9y ago

Flexible and Extensible Machine Learning in Ruby.

Etsy Conjecture 359 (archived)

scalable Machine Learning in Scalding.

Feast 6.8k updated yesterday

A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.

H2O 7.5k updated yesterday

statistical, machine learning and math runtime with Hadoop. R and Python.

Karate Club 2.3k updated 1y ago

An unsupervised machine learning library for graph structured data. Python

Keras 64.0k updated 2d ago

An intuitive neural net API inspired by Torch that runs atop Theano and Tensorflow.

Lambdo 1 updated 7y ago

Lambdo is a workflow engine which significantly simplifies the analysis process by unifying feature engineering and machine learning operations.

Little Ball of Fur 713 updated 3mo ago

A subsampling library for graph structured data. Python

MLPNeuralNet 903 updated 9y ago

Fast multilayer perceptron neural network library for iOS and Mac OS X.

ML Workspace 3.5k updated 1y ago

All-in-one web-based IDE specialized for machine learning and data science.

ND4J

A matrix library for the JVM. Numpy for Java.

nupic 6.4k updated 1y ago

Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.

PyTorch Geometric Temporal

a temporal extension library for PyTorch Geometric .

RL4J

Reinforcement learning for Java and Scala. Includes Deep-Q learning and A3C algorithms, and integrates with Open AI's Gym. Runs in the Deeplearning4j ecosystem.

scikit-learn 65.5k updated yesterday

scikit-learn: machine learning in Python.

Shapley

A data-driven framework to quantify the value of classifiers in a machine learning ensemble.

TensorFlow 194.3k updated yesterday

Library from Google for machine learning using data flow graphs.

Velox

System for serving machine learning predictions.

Vowpal Wabbit 8.7k updated 6d ago

learning system sponsored by Microsoft and Yahoo!.

BidMach 919 updated 3y ago

CPU and GPU-accelerated Machine Learning Library.

Applications

411 968 (archived)

an web application for alert management resulting from scheduled searches into Elasticsearch.

Adobe spindle 330 updated 11y ago

Next-generation web analytics processing with Scala, Spark, and Parquet.

Argus

Time series monitoring and alerting platform.

AthenaX 1.2k (archived)

a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL).

Atlas 3.5k updated 2d ago

a backend for managing dimensional time series data.

ElastAert

ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in ElasticSearch.

Eventhub 1.3k updated 4y ago

open source event analytics platform.

Hermes 851 updated 5d ago

asynchronous message broker built on top of Kafka.

Kapacitor 2.4k updated 6d ago

an open source framework for processing, monitoring, and alerting on time series data.

PivotalR 127 (archived)

R on Pivotal HD / HAWQ and PostgreSQL.

Rakam 795 updated 4y ago

open-source real-time custom analytics platform powered by Postgresql, Kinesis and PrestoDB.

SnappyData

a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) built on Spark in a single integrated cluster.

Snowplow 7.0k updated 5d ago

enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.

Substation

Substation is a cloud native data pipeline and transformation toolkit written in Go.

Data Visualization

Airpal 2.8k (archived)

Web UI for PrestoDB.

Arbor 2.7k updated 6y ago

graph visualization library using web workers and jQuery.

Banana 671 updated 7mo ago

visualize logs and time-stamped data stored in Solr. Port of Kibana.

Bloomery

Web UI for Impala.

CartoDB 2.8k updated 10mo ago

open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API.

Chartist.js 99 updated 1y ago

another open source HTML5 Charts visualization.

Cubism 4.9k updated 11mo ago

JavaScript library for time series visualization.

Dash 24.5k updated yesterday

Analytical Web Apps for Python, R, Julia, and Jupyter. Built on top of plotly, no JS required

DevExtreme React Chart

High-performance plugin-based React chart for Bootstrap and Material Design.

Echarts 66.0k updated 2d ago

Baidus enterprise charts.

Envisionjs 1.6k updated 6y ago

dynamic HTML5 visualization.

Freeboard 6.5k updated 2y ago

pen source real-time dashboard builder for IOT and other web mashups.

Gephi 6.4k updated 10d ago

An award-winning open-source platform for visualizing and manipulating large graphs and network connections. It's like Photoshop, but for graphs. Available for Windows and Mac OS X.

Matplotlib 22.6k updated 2d ago

plotting with Python.

Plotly.js

The open source javascript graphing library that powers plotly.

Recline 2.3k updated 2d ago

simple but powerful library for building data applications in pure Javascript and HTML.

Redash 28.3k updated 5d ago

open-source platform to query and visualize data.

Sigma.js 11.9k updated 5d ago

JavaScript library dedicated to graph drawing.

Superset 71.1k updated yesterday

a data exploration platform designed to be visual, intuitive and interactive, making it easy to slice, dice and visualize data and perform analytics at the speed of thought.

Vega

a visualization grammar.

Zeppelin

a notebook-style collaborative data analysis.

DataSphere Studio 3.3k updated 4mo ago

one-stop data application development management portal.

D3.compose 696 updated 3y ago

Compose complex, data-driven visualizations from reusable charts and components.