Python > Data Science
Data analysis and machine learning.
Contents
Machine Learning
Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
Extension and helper modules for Python's data analysis and machine learning libraries.
50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels.
Ensemble Methods
Imbalanced Datasets
Random Forests
Kernel Methods
Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearnbig.png" alt="sklearn"> <img height="20" src="img/tfbig2.png" alt="sklearn">
Deep Learning
PyTorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
High-level library to help with training neural networks in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
A scikit-learn compatible neural network library that wraps PyTorch. <img height="20" src="img/sklearnbig.png" alt="sklearn"> <img height="20" src="img/pytorchbig2.png" alt="PyTorch based/compatible">
TensorFlow
Computation using data flow graphs for scalable machine learning by Google.
Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
A platform that helps you build, manage and monitor deep learning models.
Keras
JAX
Automated Machine Learning
An AutoML toolkit and a drop-in replacement for a scikit-learn estimator.
Automatic architecture search and hyperparameter optimization for PyTorch.
Natural Language Processing
Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Modular Natural Language Processing workflows with Keras. <img height="20" src="img/keras_big.png" alt="Keras based/compatible">
Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
Computer Audition
Computer Vision
Datasets, Transforms, and Models specific to Computer Vision. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Industry-strength Computer Vision workflows with Keras. <img height="20" src="img/keras_big.png" alt="MXNet based">
An efficient video loader for deep learning with smart shuffling that's super easy to digest.
OpenMMLab Foundational Library for Training Deep Learning Models. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Time Series
A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Lightning fast forecasting with statistical and econometric models.
Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Reinforcement Learning
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym).
An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
An API conversion tool for popular external reinforcement learning environments.
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
An elegant PyTorch deep reinforcement learning library. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
PyTorch framework for RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
OpenDILab Decision AI Engine. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
A research framework for fast prototyping of reinforcement learning algorithms.
Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
Reinforcement Learning in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
A reinforcement library designed for pytorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Graph Machine Learning
A signed/directed graph neural network extension library for PyTorch Geometric.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
A library to build Graph Neural Networks on the TensorFlow platform.
Graph Manipulation
Learning-to-Rank & Recommender Systems
Probabilistic Graphical Models
Probabilistic Methods
A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Model Explanation
moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearnbig.png" alt="sklearn"><img height="20" src="img/Rbig.png" alt="R inspired/ported lib">
A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
InterpretML implements the Explainable Boosting Machine (EBM), a modern, fully interpretable machine learning model based on Generalized Additive Models (GAMs). This open-source package also provides visualization tools for EBMs, other glass-box models, and black-box explanations. <img height="20" src="img/sklearn_big.png" alt="sklearn">
A library for debugging/inspecting machine learning classifiers and explaining their predictions.
Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Interpretability and explainability of data and machine learning models.
A visualization of the CapsNet layers to better understand how it works.
A collection of infrastructure and tools for research in neural network interpretability.
Genetic Programming
Genetic Programming in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Genetic Algorithm in Python. <img height="20" src="img/pytorchbig2.png" alt="PyTorch based/compatible"> <img height="20" src="img/kerasbig.png" alt="keras">
Optimization
Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
SigOpt wrappers for scikit-learn methods. <img height="20" src="img/sklearn_big.png" alt="sklearn">
A Python implementation of global optimization with gaussian processes.
Sequential model-based optimization with a scipy.optimize interface.
Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
Feature Engineering
General
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression).
Visualization
General Purposes
Interactive plots
Map
Deployment
No-code in the front, Python in the back. An open-source framework for creating data apps.
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps.
Statistics
Extension to pandas dataframes describe function. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
Supply a wrapper `StockDataFrame based on the pandas.DataFrame` with inline stock statistics/indicators support.
Data Manipulation
Data Frames
Data.table for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
GPU DataFrame Library. <img height="20" src="img/pandasbig.png" alt="pandas compatible"> <img height="20" src="img/gpubig.png" alt="GPU accelerated">
NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
A pure Python implementation of Apache Spark's RDD and DStream interfaces. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
Speed up your pandas workflows by changing a single line of code. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
Pipelines
Functional data manipulation for pandas. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
Dplyr for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
pandas integration with sklearn. <img height="20" src="img/sklearnbig.png" alt="sklearn"> <img height="20" src="img/pandasbig.png" alt="pandas compatible">
Helps you conveniently work with random or sequential batches of your data and define data processing.
Clean APIs for data cleaning. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
Distributed Computing
Experimentation
Data Validation
A lightweight, flexible, and expressive statistical data testing library.
Validation & testing of ML models and data during model development, deployment, and production.
Evaluation
Library of useful metrics and plots for evaluating recommender systems.
Model evaluation made easy: plots, tables, and markdown reports.
Computations
Web Scraping
Spatial Analysis
Quantum Computing
Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.