Bioinformatics

Collection 4.0k stars GitHub

Package suites

International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.

Biopython 4.9k updated 4mo ago

Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the Open Bioinformatics Foundation. Contains the very useful Entrez package for API access to the NCBI databases.

Rust-Bio 1.8k updated 2mo ago

Rust implementations of algorithms and data structures useful for bioinformatics.

SeqAn 448 updated 3mo ago

The modern C++ library for sequence analysis.

(Poly)merase 720 updated 1y ago

A Go library and command line utility for engineering organisms.

Biocaml 123 updated 7mo ago

Biocaml aims to be a high-performance user-friendly library for Bioinformatics.

Biojava 622 updated 7mo ago

Java framework for processing biological data.

Data Tools

Downloading

GGD 42 updated 3y ago

Go Get Data; A command line interface for obtaining genomic data.

SRA-Explorer 220 updated 1y ago

Easily get SRA download links and other information.

Compressing

Genozip 182 updated 3mo ago

A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).

Data Processing

Command Line Utilities

Bioinformatics One Liners 2.0k updated 2y ago

Git repo of useful single line commands.

BioNode 313 updated 6y ago

Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows. [web ]

bioSyntax 271 updated 3y ago

Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime. [paper-2018 | web ]

CSVKit 6.4k updated 4mo ago

Utilities for working with CSV/Tab-delimited files. [web ]

csvtk 1.2k updated 4mo ago

Another cross-platform, efficient, practical and pretty CSV/TSV toolkit. [web ]

easy_qsub 29 updated 3y ago

Easily submitting PBS jobs with script template. Multiple input files supported.

grabix 86 updated 8y ago

A wee tool for random access into BGZF files.

grepq 58 updated 7mo ago

Fast FASTQ filtering by matching reads against one or more regex patterns.

gsort 36 updated 8mo ago

Sort genomic files according to a specified order.

tabix 91 (archived)

Table file index. [paper-2011 ]

wormtable 27 updated 2y ago

Write-once-read-many table for large datasets.

zindex 655 updated 3y ago

Create an index on a compressed text file.

Next Generation Sequencing

Workflow Managers

BigDataScript 92 updated 5y ago

A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [paper-2014 | web ]

Bpipe 239 updated 4mo ago

A small language for defining pipeline stages and linking them together to make pipelines. [web ]

Common Workflow Language 1.5k updated 6mo ago

a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [web ]

Cromwell 1.1k updated 4mo ago

A Workflow Management System geared towards scientific workflows. [web ]

Nextflow (recommended) 3.3k updated 3mo ago

A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [paper-2018 | web ]

redun 582 updated 3mo ago

A python-based workflow manager.

Ruffus 175 updated 5y ago

Computation Pipeline library for python widely used in science and bioinformatics. [paper-2010 | web ]

SciPipe 1.1k updated 1y ago

Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output [paper-2019 | web ]

SeqWare 29 updated 8y ago

Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [paper-2010 | web ]

Workflow Descriptor Language 27 (archived)

Workflow standard developed by the Broad. [web ]

Galaxy 1.8k updated 2mo ago

a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [paper-2018 | web ]

Pipelines

Awesome-Pipeline 6.6k updated 3mo ago

A list of pipeline resources.

Bactopia 508 updated 2mo ago

A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes. [web ]

Bacannot 106 updated 6mo ago

A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results. [web ]

bcbio-nextgen 1.0k updated 1y ago

Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [web ]

R-Peridot 7 updated 6y ago

Customizable pipeline for differential expression analysis with an intuitive GUI. [web ]

ngs-preprocess 36 updated 2y ago

A pipeline for preprocessing short and long sequencing reads, built with Nextflow. [web ]

Sequence Processing

AfterQC 213 updated 6y ago

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data. [paper-2017 ]

FastQC 589 updated 4mo ago

A quality control tool for high throughput sequence data. [web ]

Fastqp 108 updated 4mo ago

FASTQ and SAM quality control using Python.

Fastx Tookit 199 updated 4y ago

FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities. [web ]

MultiQC 1.4k updated 2mo ago

Aggregate results from bioinformatics analyses across many samples into a single report. [paper-2016 | web ]

SeqFu 126 updated 4mo ago

Sequence manipulation toolkit for FASTA/FASTQ files written in Nim. [paper-2021 | web ]

SeqKit 1.5k updated 4mo ago

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang. [paper-2016 | web ]

seqmagick 117 updated 2y ago

file format conversion in Biopython in a convenient way. [web ]

Seqtk 1.5k updated 1y ago

Toolkit for processing sequences in FASTA/Q formats.

smof 17 updated 1y ago

UNIX-style FASTA manipulation tools.

Data Analysis

Hail 1.1k updated 3mo ago

Scalable genomic analysis.

GLNexus 179 updated 2y ago

Scalable gVCF merging and joint variant calling for population sequencing projects. [paper-2018 ]

Sequence Alignment

Bowtie 2 779 updated 4mo ago

An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. [paper-2012 | web ]

BWA 1.7k updated 1y ago

Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

BWA-FastAlign 19 updated 4mo ago

BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs. [paper-2026 ]

WFA 210 updated 4mo ago

the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment [paper-2020 ]

Parasail 276 updated 10mo ago

SIMD C library for global, semi-global, and local pairwise sequence alignments [paper-2016 ]

MUMmer 551 updated 1y ago

A system for rapidly aligning entire genomes, whether in complete or draft form. [paper-1999 | paper-2002 | paper-2004 | web ]

DIAMOND 1.3k updated 2mo ago

An ultrafast protein aligner for blastp and blastx like searches. [paper-2021 ]

POA 75 updated 2y ago

Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences. [paper-2002 ]

MMseqs2 2.0k updated 4mo ago

Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets. [paper-2017 | paper-2018 ]

Quantification

Cufflinks 321 updated 6y ago

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

RSEM 466 updated 4mo ago

A software package for estimating gene and isoform expression levels from RNA-Seq data.

Variant Calling

DeepVariant 3.7k updated 4mo ago

Deep learning-based variant caller

freebayes 865 updated 5mo ago

Bayesian haplotype-based polymorphism discovery and genotyping.

GATK 299 updated 8y ago

Variant Discovery in High-Throughput Sequencing Data.

Octopus 323 updated 5mo ago

A polymorphic bayesian genotyping model with wide applicability.

VCF File Utilities

bcftools 850 updated 4mo ago

Set of tools for manipulating VCF files.

vcfanno 399 updated 10mo ago

Annotate a VCF with other VCFs/BEDs/tabixed files.

vcflib 669 updated 4mo ago

A C++ library for parsing and manipulating VCF files.

vcftools 551 updated 1y ago

VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).

Structural variant callers

Delly 508 updated 4mo ago

Structural variant discovery by integrated paired-end and split-read analysis.

lumpy 341 updated 4mo ago

lumpy: a general probabilistic framework for structural variant discovery.

manta 460 (archived)

Structural variant and indel caller for mapped sequencing data.

gridss 282 updated 1y ago

GRIDSS: the Genomic Rearrangement IDentification Software Suite.

smoove 264 updated 2y ago

structural variant calling and genotyping with existing tools, but,smoothly.

BAM File Utilities

Bamtools 429 updated 1y ago

Collection of tools for working with BAM files.

bam toolbox 1 updated 2y ago

MtDNA:Nuclear Coverage; BAM Toolbox can output the ratio of MtDNA:nuclear coverage, a proxy for mitochondrial content.

mergesam 7 updated 13y ago

Automate common SAM & BAM conversions.

mosdepth 847 updated 2mo ago

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

SAMstat 24 updated 3y ago

Displaying sequence statistics for next-generation sequencing.

Somalier 313 updated 2mo ago

Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs.

Telseq 74 updated 7y ago

Telseq is a tool for estimating telomere length from whole genome sequence data.

GFF BED File Utilities

AGAT 570 updated 3mo ago

Suite of tools to handle gene annotations in any GTF/GFF format.

gffutils 314 updated 5mo ago

GFF and GTF file manipulation and interconversion.

BEDOPS 1.0k updated 1y ago

The fast, highly scalable and easily-parallelizable genome analysis toolkit.

Variant Simulation

Bam Surgeon 248 updated 1y ago

Tools for adding mutations to existing .bam files, used for testing mutation callers.

wgsim 283 updated 4y ago

Comes with samtools! - Reads simulator.

Variant Prediction/Annotation

SIFT 537 updated 2y ago

Predicts whether an amino acid substitution affects protein function.

SnpEff 303 updated 4mo ago

Genetic variant annotation and effect prediction toolbox.

Python Modules

cruzdb 136 updated 5y ago

Pythonic access to the UCSC Genome database.

pyensembl 401 updated 2mo ago

Pythonic Access to the Ensembl database.

bioservices 335 updated 3mo ago

Access to Biological Web Services from Python.

cyvcf 53 updated 8y ago

A port of pyVCF using Cython for speed.

cyvcf2 435 updated 4mo ago

Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.

pyBedTools 329 updated 1y ago

Python wrapper for bedtools.

pyfaidx 482 updated 4mo ago

Pythonic access to FASTA files.

pysam 884 updated 3mo ago

Python wrapper for samtools.

pyVCF 418 updated 2y ago

A VCF Parser for Python.

polars-bio 159 updated 2mo ago

Python library for blazing-fast genomic interval operations and genomic file formats I/O on Polars DataFrames

Scanpy 2.4k updated 2mo ago

Scalable toolkit for analyzing single-cell gene expression data, including preprocessing, visualization, clustering, and trajectory inference.

Assembly

SPAdes 919 updated 4mo ago

SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.

SKESA 126 updated 1y ago

SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.

Minimap2 2.1k updated 5mo ago

Minimap2 is an pairwise aligner for genomic and spliced nucleotide sequences. It can perform the assembly-to-assembly alignment, and works with gzip'd FASTQ, FASTA formats. It also finds overlaps between long-reads.

Annotation

Prokka 962 updated 6mo ago

Prokka: rapid prokaryotic genome annotation. Prokka is one of the most cited annotation command line tools for microbial genome annotations.

Bakta 614 updated 5mo ago

Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.

Long-read sequencing

Long-read Assembly

canu 699 updated 4mo ago

A single molecule sequence assembler for genomes large and small.

flye 921 updated 3mo ago

De novo assembler for single molecule sequencing reads using repeat graphs.

hifiasm 756 updated 1y ago

A haplotype-resolved assembler for accurate Hifi reads.

wtdbg2 531 updated 2y ago

A fuzzy Bruijn graph approach to long noisy reads assembly

Visualization

Genome Browsers / Gene Diagrams

Squiggle 41 (archived)

Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.

biodalliance 228 updated 6y ago

Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.

BioJS 507 updated 4y ago

BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies.

Circleator 46 updated 7y ago

Flexible circular visualization of genome-associated data with BioPerl and SVG.

DNAism 62 updated 10y ago

Horizon chart D3-based JavaScript library for DNA data.

IGV js 721 updated 4mo ago

Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.

Island Plot 33 updated 11y ago

D3 JavaScript based genome viewer. Constructs SVGs.

JBrowse 473 updated 4mo ago

JavaScript genome browser that is highly customizable via plugins and track customizations.