Project Awesome project awesome

Data Science

Collection 28.7k stars GitHub

The Data Science Toolbox

Deep Learning Packages

PyTorch 98.5k updated 2d ago
torchvision 17.6k updated yesterday
torchtext 3.6k (archived)
torchaudio 2.8k updated yesterday
ignite 4.7k updated yesterday
PyTorchNet 1.7k updated 2d ago
PyToune 579 updated 10mo ago
skorch 6.2k updated 28d ago
PyVarInf 362 updated 6y ago
pytorch_geometric 23.6k updated 3d ago
GPyTorch 3.9k updated 13d ago
pyro 9.0k updated 8mo ago
Catalyst 3.4k updated 9mo ago
pytorch_tabular 1.6k updated yesterday
Yolov3 10.6k updated 7d ago
Yolov5 57.1k updated 7d ago
Yolov8 54.9k updated yesterday
TensorFlow 194.4k updated yesterday
TensorLayer 7.4k updated 3y ago
TFLearn 9.6k updated 1y ago
Sonnet 9.9k updated 1mo ago
tensorpack 6.3k updated 2y ago
TRFL 3.1k updated 3y ago
NeuPy 734 updated 1y ago
tfdeploy 355 updated 1y ago
tensorflow-upstream
TensorFlow Fold 1.8k (archived)
tensorlm 60 updated 3y ago
TensorLight 11 updated 3y ago
Mesh TensorFlow 1.6k (archived)
Ludwig 11.7k updated 9d ago
TF-Agents 3.0k updated 2mo ago
TensorForce 3.3k updated 1y ago
keras-contrib 1.6k (archived)
Hyperas 2.2k updated 3y ago
Elephas 1.6k updated 2y ago
Hera 490 updated 8y ago
Spektral 2.4k updated 2y ago
qkeras 577 updated 1mo ago
keras-rl 5.6k updated 2y ago
Talos 1.6k updated 1y ago
Netron 32.6k updated yesterday
Resseract Lite 7 updated 1y ago
vizzu 2.0k updated 8d ago
TensorWatch 3.5k updated 8d ago
MetaReview

Free online meta-analysis platform with 11 interactive D3.js statistical charts (forest plot, funnel plot, Galbraith, L'Abbé, Baujat, etc.), 5 effect size measures, AI literature screening, and publication-ready report export. github.com

Miscellaneous Tools

Polyaxon 3.7k updated 16d ago

A platform for reproducible and scalable machine learning and deep learning.

The Data Science Lifecycle Process 527 updated 4y ago

The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo

Data Science Lifecycle Template Repo 200 updated 5y ago

Template repository for data science lifecycle project

RexMex 276 updated 2y ago

A general purpose recommender metrics library for fair evaluation.

ChemicalX

A PyTorch based deep learning library for drug pair scoring.

PyTorch Geometric Temporal 3.0k updated 6mo ago

Representation learning on dynamic graphs.

Little Ball of Fur 713 updated 3mo ago

A graph sampling library for NetworkX with a Scikit-Learn like API.

Karate Club 2.3k updated 1y ago

An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.

ML Workspace 3.5k updated 1y ago

All-in-one web-based IDE for machine learning and data science. The workspace is deployed as a Docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code)

xonsh shell 9.3k updated yesterday

A Python-powered shell that enables integration, management and orchestration of data science libraries mostly written in Python, allowing you to build pipelines, code and command-based workflows. It can also be used as a kernel for Jupyter Notebook.

steppy 136 (archived)

Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces very simple interface that enables clean machine learning pipeline design.

steppy-toolkit 22 (archived)

Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.

Pandas GUI 3.3k updated 9mo ago

Pandas GUI

Polars 37.8k updated 2d ago

Fast DataFrame library for Rust and Python, designed as a faster alternative to Pandas

Hydrosphere Mist 325 updated 5y ago

a service for exposing Apache Spark analytics jobs and machine learning models as realtime, batch or reactive web services.

Nervana's python based Deep Learning Framework 3.9k (archived)

Intel Nervana reference deep learning framework committed to best performance on all hardware.

Skale 397 (archived)

High performance distributed data processing in NodeJS

Aerosolve

A machine learning package built for humans.

Intel framework 313 (archived)

Intel Deep Learning Framework

Datawrapper 1.4k updated 1y ago

An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com

Featuretools 7.6k updated 1mo ago

An open source framework for automated feature engineering written in python

Optimus 1.5k updated 1y ago

Cleansing, pre-processing, feature engineering, exploratory data analysis and easy ML with PySpark backend.

Albumentations 15.3k (archived)

А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, and detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.

DVC 15.5k updated 2d ago

Open-source version control system for machine learning projects

Lambdo 25 updated 5y ago

is a workflow engine that significantly simplifies data analysis by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation.

Feast 6.8k updated 2d ago

A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.

Trains 6.6k updated 2d ago

Auto-Magical Experiment Manager, Version Control & DevOps for AI

Hopsworks 1.3k updated 1y ago

Open-source data-intensive machine learning platform with a feature store. Ingest and manage features for both online (MySQL Cluster) and offline (Apache Hive) access, train and serve models at scale.

MindsDB 38.8k updated 2d ago

MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.

Lightwood 502 updated 1mo ago

A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with an objective to build predictive models with one line of code.

AWS Data Wrangler 4.1k updated 2d ago

An open-source Python package that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc).

CML 4.2k updated 9mo ago

An open source toolkit for using continuous integration in data science projects. Automatically train and test models in production-like environments with GitHub Actions & GitLab CI, and autogenerate visual reports on pull/merge requests.

DuckDB 36.9k updated 2d ago

An in-process SQL OLAP database management system

IJulia 2.9k updated 15d ago

a Julia-language backend combined with the Jupyter interactive environment

Apache Airflow 44.8k updated yesterday

Platform to programmatically author, schedule, and monitor workflows

Prefect 21.9k updated yesterday

Workflow management system for modern data stacks

Kedro 10.8k updated 2d ago

Open-source Python framework for creating reproducible, maintainable data science code

Hamilton 2.4k updated 2d ago

Lightweight library to author and manage reliable data transformations

SHAP 25.2k updated 13d ago

Game theoretic approach to explain the output of any machine learning model

InterpretML 6.8k updated 2d ago

InterpretML implements the Explainable Boosting Machine (EBM), a modern, fully interpretable machine learning model based on Generalized Additive Models (GAMs). This open-source package also provides visualization tools for EBMs, other glass-box models, and black-box explanations

LIME 12.1k updated 1y ago

Explaining the predictions of any machine learning classifier

flyte 6.9k updated yesterday

Workflow automation platform for machine learning

dbt 12.5k updated yesterday

Data build tool

zasper 2.3k updated 16d ago

Supercharged IDE for Data Science

skrub 1.6k updated 2d ago

A Python library to ease preprocessing and feature engineering for tabular machine learning

Chinese-Elite 68 updated yesterday

An open-source project that automatically maps relationship networks by parsing public data using LLMs and visualizes it as an interactive graph.

dna-claude-analysis 25 updated 20d ago

Personal genome analysis toolkit with Python scripts analyzing raw DNA data across 17 categories (health risks, ancestry, pharmacogenomics, nutrition, psychology, and more) and generating a terminal-style single-page HTML visualization.

RunMat

Fast MATLAB-syntax runtime with automatic CPU/GPU execution and fused array kernels.

Turbostream 16 updated 1mo ago

A terminal UI for experimenting with custom rule engines and selective LLM analysis on real-time data streams, without worrying about streaming infra or backpressure.

WFGY ProblemMap 1.7k updated yesterday

Open source “failure atlas” of 16 recurring issues in LLM and RAG pipelines, with observable symptoms and suggested fixes for data science teams.

DeepAnalyze 3.9k updated yesterday

An agentic LLM for autonomous data science, which can autonomously complete a wide range of data science tasks without human intervention.

Python Data Science Handbook 47.1k updated 1y ago

Python Data Science Handbook: full text in Jupyter Notebooks

Shapley 224 updated 2mo ago

A data-driven framework to quantify the value of classifiers in a machine learning ensemble.

Towhee 3.5k updated 1y ago

A Python library that helps you encode your unstructured data into embeddings.

LineaPy 669 updated 1y ago

Ever been frustrated with cleaning up long, messy Jupyter notebooks? With LineaPy, an open source Python library, it takes as little as two lines of code to transform messy development code into production pipelines.

envd 2.2k updated 6d ago

️ machine learning development environment for data science and AI/ML engineering teams

MLEM 718 (archived)

Version and deploy your ML models following GitOps principles

cleanlab 11.4k updated 2mo ago

Python library for data-centric AI and automatically detecting various issues in ML datasets

AutoGluon 10.1k updated 2d ago

AutoML to easily produce accurate predictions for image, text, tabular, time-series, and multi-modal data

Comet 171 updated 6d ago

An MLOps platform with experiment tracking, model production management, a model registry, and full data lineage to support your ML workflow from training straight through to production.

Opik 18.5k updated yesterday

Evaluate, test, and ship LLM applications across your dev and production lifecycles.

teeplot 12 updated 1y ago

Workflow tool to automatically organize data visualization output

Streamlit 44.0k updated yesterday

App framework for Machine Learning and Data Science projects

Gradio 42.1k updated yesterday

Create customizable UI components around machine learning models

Weights & Biases 10.9k updated 2d ago

Experiment tracking, dataset versioning, and model management

Optuna 13.8k updated today

Automatic hyperparameter optimization software framework

Ray Tune 41.8k updated 2d ago

Scalable hyperparameter tuning library

Chaos Genius 775 (archived)

ML powered analytics engine for outlier/anomaly detection and root cause analysis