Apache Spark
Unified engine for large-scale data processing.
Contents
Packages
Language Bindings
Kotlin API bindings and extensions.
.NET bindings.
An alternative R backend, using dplyr.
Haskell on Apache Spark.
Rust bindings.
Golang bindings.
C# bindings.
Notebooks and IDEs
A scala kernel for Jupyter.
Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.
Polynote: an IDE-inspired polyglot notebook. It supports mixing multiple languages in one notebook, and sharing data between them seamlessly. It encourages reproducible notebooks with its immutable data model. Originating from Netflix.
Jupyter magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through Livy, in Jupyter notebooks.
General Purpose Libraries
A library that brings useful functions from modern database management systems to Apache Spark.
A Scala library with essential Spark functions and extensions to make you more productive.
A native PySpark implementation of spark-daria.
A library of general purpose functions and UDF's.
joblib backend for running tasks on Spark clusters.
SQL Data Sources
Storage
Storage layer with ACID transactions.
Upserts, Deletes And Incremental Processing on Big Data..
Upserts, Deletes And Incremental Processing on Big Data..
Integration with the lakeFS atomic versioned storage layer.
Bioinformatics
GIS
Graph Processing
Machine Learning Extension
PMML transformer library for Spark ML.
A system to manage machine learning models for spark.ml and scikit-learn <img src="https://img.shields.io/github/last-commit/scikit-learn/scikit-learn.svg">.
H2O interoperability layer.
Distributed Deep Learning library.
Execution engine and serialization format which supports deployment of o.a.s.ml models without dependency on SparkSession.
A distributed ml library with support for LightGBM, Vowpal Wabbit, OpenCV, Deep Learning, Cognitive Services, and Model Deployment.
Machine learning orchestration platform.
Middleware
REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing.
Simple Spark as a Service which supports objects sharing using so called named objects. JVM only.
IPython protocol based middleware for interactive applications.
A distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark.