Biomedical Information Extraction
How to extract information from unstructured biomedical data and text.
Contents
Code Libraries
Python tools primarily intended for bioinformatics and computational molecular biology purposes, but also a convenient way to obtain data, including documents/abstracts from PubMed (see Chapter 9 of the documentation).
A framework for biomedical coreference resolution.
A system for building predictive medical natural language processing models. Built on the spaCy framework.
A version of the spaCy framework for scientific and biomedical documents.
R utilities for accessing NCBI resources, including PubMed.
a Python package and model (for use with spaCy) for doing NER with medication-related concepts.
Tools, Platforms, and Services
A system for processing the text in electronic medical records. Widely used and open source.
A system for processing documents describing cancer presentations. Based on cTAKES (see above).
A framework for running text mining tools on the newest set(s) of documents from PubMed.
an IE infrastructure for electronic health records (EHR). Built on the CogStack project.
a framework for IE from tables in the literature.
Annotation Tools
An annotation tool with adjudication and progress tracking features.
The brat rapid annotation tool. Supports producing text annotations visually, through the browser. Not subject specific; appropriate for many annotation projects. Visualization is based on that of the stav tool.
An annotation tool designed to have minimal dependencies.
Techniques and Models
BERT models
A PubMed and PubMed Central-trained version of the BERT language model.
A PubMed and PubMed Central-trained version of the BERT language model.
paper
paper
paper - A BERT model trained on >1M papers from the Semantic Scholar database.
paper - A BERT model pre-trained on PubMed text and MIMIC-III notes.