Bioinformatics
Contents
Package suites
International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.
Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the Open Bioinformatics Foundation. Contains the very useful Entrez package for API access to the NCBI databases.
Rust implementations of algorithms and data structures useful for bioinformatics.
Data Tools
Downloading
Data Processing
Command Line Utilities
Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows. [web ]
Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime. [paper-2018 | web ]
Another cross-platform, efficient, practical and pretty CSV/TSV toolkit. [web ]
Next Generation Sequencing
Workflow Managers
A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [paper-2014 | web ]
A small language for defining pipeline stages and linking them together to make pipelines. [web ]
a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [web ]
A Workflow Management System geared towards scientific workflows. [web ]
A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [paper-2018 | web ]
Computation Pipeline library for python widely used in science and bioinformatics. [paper-2010 | web ]
Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output [paper-2019 | web ]
Pipelines
A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes. [web ]
A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results. [web ]
Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [web ]
Sequence Processing
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data. [paper-2017 ]
FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities. [web ]
Aggregate results from bioinformatics analyses across many samples into a single report. [paper-2016 | web ]
Sequence manipulation toolkit for FASTA/FASTQ files written in Nim. [paper-2021 | web ]
Data Analysis
Sequence Alignment
An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. [paper-2012 | web ]
BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs. [paper-2026 ]
the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment [paper-2020 ]
SIMD C library for global, semi-global, and local pairwise sequence alignments [paper-2016 ]
A system for rapidly aligning entire genomes, whether in complete or draft form. [paper-1999 | paper-2002 | paper-2004 | web ]
An ultrafast protein aligner for blastp and blastx like searches. [paper-2021 ]
Quantification
Variant Calling
VCF File Utilities
Structural variant callers
BAM File Utilities
GFF BED File Utilities
Variant Simulation
Variant Prediction/Annotation
Python Modules
Assembly
SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.
Long-read sequencing
Visualization
Genome Browsers / Gene Diagrams
Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.
Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies.
Flexible circular visualization of genome-associated data with BioPerl and SVG.
Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
JavaScript genome browser that is highly customizable via plugins and track customizations.
Point and click, cross platform suite for analysing and visualizing next-generation sequencing datasets.