Project Awesome project awesome

Computational Biology

Computational approaches applied to problems in biology.

Collection 121 stars GitHub

Databases

CZ CELLxGENE

Single-cell dataset repository and interactive explorer from the Chan Zuckerberg Initiative.

Gene Expression Omnibus

Public functional genomics database.

Single Cell PORTAL

Public database for single-cell RNA.

Single Cell Expression Atlas

Public database for single-cell RNA.

Drug Repurposing Hub

Collections of drug repurposing data (drug, MoA, target, etc).

PathwayCommons

Database of pathways and interactions.

WikiPathways 9 updated 5y ago

Database of biological pathways.

Reactome

Expert-curated, peer-reviewed pathway database with detailed reaction mechanisms.

BioCyc

Collection of pathway/genome databases across thousands of organisms.

SIGNOR

Database of causal signaling interactions and pathways.

MSigDB (Molecular Signatures Database)

Curated gene sets derived from pathways and biological processes.

MassBank

Open source databases and tools for mass spectrometry reference spectra.

MoNA MassBank of North America

Meta-database of metabolite mass spectra, metadata, and associated compounds.

THE HUMAN PROTEIN ATLAS

Comprehensive human protein database (cells, tissues, organs).

PROTEIN DATA BANK (PDB)

3D structures of proteins, nucleic acids, complexes.

UniProt

Functional information on proteins.

AlphaFold Protein Structure Database 14.4k updated 12d ago

3D protein structure predictions.

RCSB Protein Data Bank

Repository for structural data of biological molecules.

Critical Assessment of Structure Prediction (CASP)

Assessing methods for protein structure prediction.

Uniclust

Clustered protein sequence databases.

CATH database

Hierarchical classification of protein domain structures.

SAbDab

Structural Antibody Database containing all antibody structures in the PDB.

OADB (Observed Antibody Space Database)

Database of antibody sequences from immune repertoire sequencing.

Machine Learning Tasks and Models

Multi-Omics Foundation Models

scMulan 62 updated 1y ago

Single-cell multi-omic language model pretrained on ~10M cells spanning transcriptomics, epigenomics, and proteomics for cross-omics transfer tasks.

MultiVI 1.6k updated yesterday

Multi-modal variational autoencoder for integrating paired and unpaired single-cell RNA-seq and ATAC-seq measurements into a unified latent space.

MIRA 68 updated 8mo ago

Probabilistic multimodal topic model jointly modeling single-cell transcriptomics and chromatin accessibility for regulatory network inference.

GLUE

Graph-Linked Unified Embedding framework for unpaired single-cell multi-omics data integration across RNA, ATAC, methylation, and protein modalities.

BABEL 47 updated 2y ago

Cross-modality translation model enabling prediction between scRNA-seq and scATAC-seq profiles without requiring paired single-cell measurements.

Multigrate 32 updated 2d ago

Asymmetric multi-omics variational autoencoder for integrating single-cell data across RNA, ATAC, and protein modalities with missing-modality support.

MOFA+ 388 updated 1mo ago

Multi-Omics Factor Analysis framework identifying shared axes of variation across bulk and single-cell datasets including RNA, ATAC, proteomics, methylation, and copy number.

GeneCompass 111 updated 1mo ago

Large-scale foundation model integrating DNA regulatory sequences and single-cell transcriptomics from 120M+ cells across multiple species for gene regulation prediction.

UnitedNet

Interpretable multi-task deep neural network for single-cell multi-omics integration spanning transcriptomics, chromatin accessibility, and proteomics.

SpatialGlue

Graph attention network for spatial multi-omics integration jointly embedding spatial transcriptomics with chromatin accessibility or proteomics.

MIDAS 63 updated 8d ago

Mosaic integration and differential accessibility model for single-cell multi-omics data that handles arbitrary missing-modality combinations across transcriptomics, chromatin accessibility, and proteomics.

Protein Foundation Models

ESMFold 4.0k (archived)

Fast protein structure prediction using language model embeddings.

ChemBERTa-2 488 updated 1y ago

Chemical embeddings & prediction.

AlphaFold3 7.8k updated 15d ago

Predicts structures of proteins, nucleic acids, small molecules, and their complexes.

Boltz-1 3.9k updated 2d ago

Open-source all-atom biomolecular structure prediction model for proteins, nucleic acids, small molecules, and their complexes achieving AlphaFold3-level accuracy.

Chai-1 1.9k updated 2d ago

Unified molecular structure prediction model covering proteins, nucleic acids, small molecules, and complexes.

ESM3 2.3k updated 7d ago

Multimodal protein language model that jointly reasons over sequence, structure, and function for generative protein design and engineering.

RFdiffusion 2.8k updated 4mo ago

Generative model for protein backbone design using diffusion.

ProteinMPNN 1.7k updated 1y ago

Deep learning model for protein sequence design given backbone structure.

OmegaFold 615 updated 3y ago

High-resolution de novo protein structure prediction from sequence.

RoseTTAFold 2.2k updated 2y ago

Three-track neural network for protein structure prediction.

OpenFold 3.3k updated 3mo ago

Trainable, memory-efficient open-source reproduction of AlphaFold2 enabling custom protein structure prediction workflows.

SaProt

Structure-aware protein language model using structure-aware tokens that encode both sequence and backbone geometry for improved function prediction.

EvoDiff

Discrete diffusion framework for protein sequence generation trained on evolutionary-scale data, supporting unconditional generation, disordered region design, and functional motif scaffolding.