Curated list of awesome lists
Probably the best curated list of data science software in Python
Contents
Machine Learning
General Purpouse Machine Learning

scikitlearn  Machine learning in Python.

Shogun  Machine learning toolbox.

xLearn  High Performance, Easytouse, and Scalable Machine Learning Package.

cuML  RAPIDS Machine Learning Library.

modAL  Modular active learning framework for Python3.

Sparkitlearn  PySpark + scikitlearn = Sparkitlearn.

mlpack  A scalable C++ machine learning library (Python bindings).

dlib  Toolkit for making real world machine learning and data analysis applications in C++ (Python bindings).

MLxtend  Extension and helper modules for Python's data analysis and machine learning libraries.

hyperlearn  50%+ Faster, 50%+ less RAM usage, GPU support rewritten Sklearn, Statsmodels.

Reproducible Experiment Platform (REP)  Machine Learning toolbox for Humans.

scikitmultilearn  Multilabel classification for python.

seqlearn  Sequence classification toolkit for Python.

pystruct  Simple structured learning framework for Python.

sklearnexpertsys  Highly interpretable classifiers for scikit learn.

RuleFit  Implementation of the rulefit.

metriclearn  Metric learning algorithms in Python.

pyGAM  Generalized Additive Models in Python.

Karate Club  An unsupervised machine learning library for graph structured data.

Little Ball of Fur  A library for sampling graph structured data.

causalml  Uplift modeling and causal inference with machine learning algorithms.

Deepchecks  Validation & testing of ML models and data during model development, deployment, and production.
Automated Machine Learning

TPOT  Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

autosklearn  An automated machine learning toolkit and a dropin replacement for a scikitlearn estimator.

MLBox  A powerful Automated Machine Learning python library.
Ensemble Methods

MLEnsemble  High performance ensemble learning.

Stacking  Simple and useful stacking library, written in Python.

stacked_generalization  Library for machine learning stacking generalization.

vecstack  Python package for stacking (machine learning technique).
Imbalanced Datasets

imbalancedlearn  Module to perform under sampling and over sampling with various techniques.

imbalancedalgorithms  Pythonbased implementations of algorithms for learning on imbalanced data.
Random Forests
Extreme Learning Machine

PythonELM  Extreme Learning Machine implementation in Python.

Python Extreme Learning Machine (ELM)  A machine learning technique used for classification/regression tasks.

hpelm  High performance implementation of Extreme Learning Machines (fast randomized neural networks).
Kernel Methods

pyFM  Factorization machines in python.

fastFM  A library for Factorization Machines.

tffm  TensorFlow implementation of an arbitrary order Factorization Machine.

liquidSVM  An implementation of SVMs.

scikitrvm  Relevance Vector Machine implementation using the scikitlearn API.

ThunderSVM  A fast SVM Library on GPUs and CPUs.
Gradient Boosting

XGBoost  Scalable, Portable and Distributed Gradient Boosting.

LightGBM  A fast, distributed, high performance gradient boosting.

CatBoost  An opensource gradient boosting on decision trees library.

ThunderGBM  Fast GBDTs and Random Forests on GPUs.
Deep Learning
PyTorch

PyTorch  Tensors and Dynamic neural networks in Python with strong GPU acceleration.

torchvision  Datasets, Transforms and Models specific to Computer Vision.

torchtext  Data loaders and abstractions for text and NLP.

torchaudio  An audio library for PyTorch.

ignite  Highlevel library to help with training neural networks in PyTorch.

PyToune  A Keraslike framework and utilities for PyTorch.

skorch  A scikitlearn compatible neural network library that wraps pytorch.

PyTorchNet  An abstraction to train neural networks.

pytorch_geometric  Geometric Deep Learning Extension Library for PyTorch.

Catalyst  Highlevel utils for PyTorch DL & RL research.

pytorch_geometric_temporal  Temporal Extension Library for PyTorch Geometric.
TensorFlow

TensorFlow  Computation using data flow graphs for scalable machine learning by Google.

TensorLayer  Deep Learning and Reinforcement Learning Library for Researcher and Engineer.

TFLearn  Deep learning library featuring a higherlevel API for TensorFlow.

Sonnet  TensorFlowbased neural network library.

tensorpack  A Neural Net Training Interface on TensorFlow.

Polyaxon  A platform that helps you build, manage and monitor deep learning models.

NeuPy  NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: ).

tfdeploy  Deploy tensorflow graphs for fast evaluation and export to tensorflowless environments running numpy.

tensorflowupstream  TensorFlow ROCm port.

TensorFlow Fold  Deep learning with dynamic computation graphs in TensorFlow.

tensorlm  Wrapper library for text generation / language models at char and word level with RNN.

TensorLight  A highlevel framework for TensorFlow.

Mesh TensorFlow  Model Parallelism Made Easier.

Ludwig  A toolbox, that allows to train and test deep learning models without the need to write code.

Keras  A highlevel neural networks API running on top of TensorFlow.

kerascontrib  Keras community contributions.

Hyperas  Keras + Hyperopt: A very simple wrapper for convenient hyperparameter.

Elephas  Distributed Deep learning with Keras & Spark.

Hera  Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Spektral  Deep learning on graphs.

qkeras  A quantization deep learning library.
MXNet

MXNet  Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutationaware Dataflow Dep Scheduler.

Gluon  A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet).

MXbox  Simple, efficient and flexible vision toolbox for mxnet framework.

gluoncv  Provides implementations of the stateoftheart deep learning models in computer vision.

gluonnlp  NLP made easy.

Xfer  Transfer Learning library for Deep Neural Networks.

MXNet  HIP Port of MXNet.
Others

Tangent  SourcetoSource Debuggable Derivatives in Pure Python.

autograd  Efficiently computes derivatives of numpy code.

Myia  Deep Learning framework (prealpha).

nnabla  Neural Network Libraries by Sony.

Caffe  A fast open framework for deep learning.

hipCaffe  The HIP port of Caffe.
DISCONTINUED PROJECTS
Web Scraping

BeautifulSoup: The easiest library to scrape static websites for beginners

Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the coure

Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.

Pattern: High level scraping for wellestablish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization

twitterscraper: Efficient library to scrape twitter
Data Manipulation
Data Containers

pandas  Powerful Python data analysis toolkit.

pandas_profiling  Create HTML profiling reports from pandas DataFrame objects

cuDF  GPU DataFrame Library.

blaze  NumPy and pandas interface to Big Data.

pandasql  Allows you to query pandas DataFrames using SQL syntax.

pandasgbq  pandas Google Big Query.

xpandas  Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.

pysparkling  A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Arctic  High performance datastore for time series and tick data.

datatable  Data.table for Python.

koalas  pandas API on Apache Spark.

modin  Speed up your pandas workflows by changing a single line of code.

swifter  A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.

pandas_flavor  A package which allow to write your own flavor of Pandas easily.

pandaslog  A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues.

vaex  OutofCore DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
Pipelines

pdpipe  Sasy pipelines for pandas DataFrames.

SSPipe  Python pipe () operator with support for DataFrames and Numpy and Pytorch.

pandasply  Functional data manipulation for pandas.

Dplython  Dplyr for Python.

sklearnpandas  pandas integration with sklearn.

Dataset  Helps you conveniently work with random or sequential batches of your data and define data processing.

pyjanitor  Clean APIs for data cleaning.

meza  A Python toolkit for processing tabular data.

Prodmodel  Build system for data science pipelines.

dopanda  Hints and tips for using pandas in an analysis environment.

CircleCi: Automates your software builds, tests, and deployments.
Feature Engineering
General

Featuretools  Automated feature engineering.

sklgroups  A scikitlearn addon to operate on set/"group"based features.

Feature Forge  A set of tools for creating and testing machine learning feature.

few  A feature engineering wrapper for sklearn.

scikitmdr  A sklearncompatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

tsfresh  Automatic extraction of relevant features from time series.
Feature Selection

scikitfeature  Feature selection repository in python.

boruta_py  Implementations of the Boruta allrelevant feature selection method.

BoostARoota  A fast xgboost feature selection algorithm.

scikitrebate  A scikitlearncompatible Python implementation of ReBATE, a suite of Reliefbased feature selection algorithms for Machine Learning.
Visualization
General Purposes

Matplotlib  Plotting with Python.

seaborn  Statistical data visualization using matplotlib.

prettyplotlib  Painlessly create beautiful matplotlib plots.

pythonternary  Ternary plotting library for python with matplotlib.

missingno  Missing data visualization module for Python.

chartify  Python library that makes it easy for data scientists to create charts.

physt  Improved histograms.
Interactive plots

animatplot  A python package for animating plots build on matplotlib.

plotly  A Python library that makes interactive and publicationquality graphs.

Bokeh  Interactive Web Plotting for Python.

Altair  Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph

bqplot  Plotting library for IPython/Jupyter notebooks

pyecharts  Migrated from Echarts, a charting and visualization library, to Python's interactive visual drawing library.
Map

folium  Makes it easy to visualize data on an interactive open street map

geemap  Python package for interactive mapping with Google Earth Engine (GEE)
Automatic Plotting

HoloViews  Stop plotting your data  annotate your data and let it visualize itself.

AutoViz: Visualize data automatically with 1 line of code (ideal for machine learning)

SweetViz: Visualize and compare datasets, target values and associations, with one line of code.
NLP

pyLDAvis: Visualize interactive topic model
Deployment

datapane  A collection of APIs to turn scripts and notebooks into interactive reports.

binder  Enable sharing and execute Jupyter Notebooks

fastapi  Modern, fast (highperformance), web framework for building APIs with Python

streamlit  Make it easy to deploy machine learning model
Model Explanation

Shapley  A datadriven framework to quantify the value of classifiers in a machine learning ensemble.

Alibi  Algorithms for monitoring and explaining machine learning models.

anchor  Code for "HighPrecision ModelAgnostic Explanations" paper.

aequitas  Bias and Fairness Audit Toolkit.

Contrastive Explanation  Contrastive Explanation (Foil Trees).

yellowbrick  Visual analysis and diagnostic tools to facilitate machine learning model selection.

scikitplot  An intuitive library to add plotting functionality to scikitlearn objects.

shap  A unified approach to explain the output of any machine learning model.

ELI5  A library for debugging/inspecting machine learning classifiers and explaining their predictions.

Lime  Explaining the predictions of any machine learning classifier.

FairML  FairML is a python toolbox auditing the machine learning models for bias.

L2X  Code for replicating the experiments in the paper Learning to Explain: An InformationTheoretic Perspective on Model Interpretation.

PDPbox  Partial dependence plot toolbox.

pyBreakDown  Python implementation of R package breakDown.

PyCEbox  Python Individual Conditional Expectation Plot Toolbox.

Skater  Python Library for Model Interpretation.

modelanalysis  Model analysis tools for TensorFlow.

themisml  A library that implements fairnessaware machine learning algorithms.

treeinterpreter  Interpreting scikitlearn's decision tree and random forest predictions.

AI Explainability 360  Interpretability and explainability of data and machine learning models.

Auralisation  Auralisation of learned features in CNN (for audio).

CapsNetVisualization  A visualization of the CapsNet layers to better understand how it works.

lucid  A collection of infrastructure and tools for research in neural network interpretability.

Netron  Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).

FlashLight  Visualization Tool for your NeuralNetwork.

tensorboardpytorch  Tensorboard for pytorch (and chainer, mxnet, numpy, ...).

mxboard  Logging MXNet data for visualization in TensorBoard.
Reinforcement Learning

OpenAI Gym  A toolkit for developing and comparing reinforcement learning algorithms.

Coach  Easy experimentation with state of the art Reinforcement Learning algorithms.

garage  A toolkit for reproducible reinforcement learning research.

OpenAI Baselines  Highquality implementations of reinforcement learning algorithms.

Stable Baselines  A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.

RLlib  Scalable Reinforcement Learning.

Horizon  A platform for Applied Reinforcement Learning.

TFAgents  A library for Reinforcement Learning in TensorFlow.

TensorForce  A TensorFlow library for applied reinforcement learning.

TRFL  TensorFlow Reinforcement Learning.

Dopamine  A research framework for fast prototyping of reinforcement learning algorithms.

kerasrl  Deep Reinforcement Learning for Keras.

ChainerRL  A deep reinforcement learning library built on top of Chainer.
Probabilistic Methods

pomegranate  Probabilistic and graphical models for Python.

pyro  A flexible, scalable deep probabilistic programming library built on PyTorch.

ZhuSuan  Bayesian Deep Learning.

PyMC  Bayesian Stochastic Modelling in Python.

PyMC3  Python package for Bayesian statistical modeling and Probabilistic Machine Learning.

sampled  Decorator for reusable models in PyMC3.

Edward  A library for probabilistic modeling, inference, and criticism.

InferPy  Deep Probabilistic Modelling Made Easy.

GPflow  Gaussian processes in TensorFlow.

PyStan  Bayesian inference using the NoUTurn sampler (Python interface).

sklearnbayes  Python package for Bayesian Machine Learning with scikitlearn API.

skggm  Estimation of general graphical models.

pgmpy  A python library for working with Probabilistic Graphical Models.

skpro  Supervised domainagnostic prediction framework for probabilistic modelling by The Alan Turing Institute.

Aboleth  A barebones TensorFlow framework for Bayesian deep learning and Gaussian process approximation.

PtStat  Probabilistic Programming and Statistical Inference in PyTorch.

PyVarInf  Bayesian Deep Learning methods with Variational Inference for PyTorch.

emcee  The Python ensemble sampling toolkit for affineinvariant MCMC.

hsmmlearn  A library for hidden semiMarkov models with explicit durations.

pyhsmm  Bayesian inference in HSMMs and HMMs.

GPyTorch  A highly efficient and modular implementation of Gaussian Processes in PyTorch.

MXFusion  Modular Probabilistic Programming on MXNet.

sklearncrfsuite  A scikitlearn inspired API for CRFsuite.
Genetic Programming

gplearn  Genetic Programming in Python.

DEAP  Distributed Evolutionary Algorithms in Python.

karoo_gp  A Genetic Programming platform for Python with GPU support.

monkeys  A stronglytyped genetic programming framework for Python.

sklearngenetic  Genetic feature selection module for scikitlearn.
Optimization

Spearmint  Bayesian optimization.

BoTorch  Bayesian optimization in PyTorch.

scikitopt  Heuristic Algorithms for optimization.

SMAC3  Sequential Modelbased Algorithm Configuration.

Optunity  Is a library containing various optimizers for hyperparameter tuning.

hyperopt  Distributed Asynchronous Hyperparameter Optimization in Python.

hyperoptsklearn  Hyperparameter optimization for sklearn.

sklearndeap  Use evolutionary algorithms instead of gridsearch in scikitlearn.

sigopt_sklearn  SigOpt wrappers for scikitlearn methods.

Bayesian Optimization  A Python implementation of global optimization with gaussian processes.

SafeOpt  Safe Bayesian Optimization.

scikitoptimize  Sequential modelbased optimization with a
scipy.optimize
interface.

Solid  A comprehensive gradientfree optimization framework written in Python.

PySwarms  A research toolkit for particle swarm optimization in Python.

Platypus  A Free and Open Source Python Library for Multiobjective Optimization.

GPflowOpt  Bayesian Optimization using GPflow.

POT  Python Optimal Transport library.

Talos  Hyperparameter Optimization for Keras Models.

nlopt  Library for nonlinear optimization (global and local, constrained or unconstrained).
Time Series

sktime  A unified framework for machine learning with time series.

tslearn  Machine learning toolkit dedicated to timeseries data.

tick  Module for statistical learning, with a particular emphasis on timedependent modelling.

Prophet  Automatic Forecasting Procedure.

PyFlux  Open source time series library for Python.

bayesloop  Probabilistic programming framework that facilitates objective model selection for timevarying parameter models.

luminol  Anomaly Detection and Correlation library.

dateutil  Powerful extensions to the standard datetime module

maya  makes it very easy to parse a string and for changing timezones
Natural Language Processing

NLTK  Modules, data sets, and tutorials supporting research and development in Natural Language Processing.

CLTK  The Classical Language Toolkik.

gensim  Topic Modelling for Humans.

PSIToolkit  A natural language processing toolkit.

pyMorfologik  Python binding for Morfologik.

skift  Scikitlearn wrappers for Python fastText.

Phonemizer  Simple text to phonemes converter for multiple languages.

flair  Very simple framework for stateoftheart NLP.

spaCy  IndustrialStrength Natural Language Processing.
Computer Audition

librosa  Python library for audio and music analysis.

Yaafe  Audio features extraction.

aubio  A library for audio and music analysis.

Essentia  Library for audio and music analysis, description and synthesis.

LibXtract  A simple, portable, lightweight library of audio feature extraction functions.

Marsyas  Music Analysis, Retrieval and Synthesis for Audio Signals.

muda  A library for augmenting annotated audio data.

madmom  Python audio and music signal processing library.
Computer Vision

OpenCV  Open Source Computer Vision Library.

scikitimage  Image Processing SciKit (Toolbox for SciPy).

imgaug  Image augmentation for machine learning experiments.

imgaug_extension  Additional augmentations for imgaug.

Augmentor  Image augmentation library in Python for machine learning.

albumentations  Fast image augmentation library and easy to use wrapper around other libraries.
Statistics

pandas_summary  Extension to pandas dataframes describe function.

Pandas Profiling  Create HTML profiling reports from pandas DataFrame objects.

statsmodels  Statistical modeling and econometrics in Python.

stockstats  Supply a wrapper
StockDataFrame
based on the pandas.DataFrame
with inline stock statistics/indicators support.

weightedcalcs  A pandasbased utility to calculate weighted means, medians, distributions, standard deviations, and more.

scikitposthocs  Pairwise Multiple Comparisons Posthoc Tests.

Alphalens  Performance analysis of predictive (alpha) stock factors.
Distributed Computing

Horovod  Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

PySpark  Exposes the Spark programming model to Python.

Veles  Distributed machine learning platform.

Jubatus  Framework and Library for Distributed Online Machine Learning.

DMTK  Microsoft Distributed Machine Learning Toolkit.

PaddlePaddle  PArallel Distributed Deep LEarning.

daskml  Distributed and parallel machine learning.

Distributed  Distributed computation in Python.
Experimentation

Sacred  A tool to help you configure, organize, log and reproduce experiments.

Xcessiv  A webbased application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.

Persimmon  A visual dataflow programming language for sklearn.

Ax  Adaptive Experimentation Platform.

Neptune  A lightweight ML experiment tracking, results visualization and management tool.
Evaluation

recmetrics  Library of useful metrics and plots for evaluating recommender systems.

Metrics  Machine learning evaluation metric.

sklearnevaluation  Model evaluation made easy: plots, tables and markdown reports.

AI Fairness 360  Fairness metrics for datasets and ML models, explanations and algorithms to mitigate bias in datasets and models.
Computations

numpy  The fundamental package needed for scientific computing with Python.

Dask  Parallel computing with task scheduling.

bottleneck  Fast NumPy array functions written in C.

CuPy  NumPylike API accelerated with CUDA.

scikittensor  Python library for multilinear algebra and tensor factorizations.

numdifftools  Solve automatic numerical differentiation problems in one or more variables.

quaternion  Add builtin support for quaternions to numpy.

adaptive  Tools for adaptive and parallel samping of mathematical functions.
Spatial Analysis

GeoPandas  Python tools for geographic data.

PySal  Python Spatial Analysis Library.
Quantum Computing

PennyLane  Quantum machine learning, automatic differentiation, and optimization of hybrid quantumclassical computations.

QML  A Python Toolkit for Quantum Machine Learning.
Conversion

sklearnporter  Transpile trained scikitlearn estimators to C, Java, JavaScript and others.

ONNX  Open Neural Network Exchange.

MMdnn  A set of tools to help users interoperate among different deep learning frameworks.
Contributing
Contributions are welcome! :sunglasses:
Read the contribution guideline.
License
This work is licensed under the Creative Commons Attribution 4.0 International License  CC BY 4.0