Speech and Natural Language Processing > NLP with Ruby
Contents
NLP Pipeline Subtasks
Pipeline Generation
Definition framework for operation pipelines.
Spark bindings with an easy to understand DSL.
Simplified Ruby Client for Apache Kafka.
Supervisor for parallel execution on multiple CPUs or in many threads.
Rake extensions to run local and remote tasks in parallel.
Multipurpose Engines
Ruby Bindings for the OpenNLP Toolkit.
Ruby Bindings for the Stanford CoreNLP tools.
Natural Language Processing framework for Ruby (like NLTK for Python).
Wrapper over some OpenNLP classes and the original Berkeley Parser.
JRuby Bindings for the OpenNLP Toolkit.
Wrapper module for spaCy NLP library via PyCall.
Legacy Ruby SDK for AlchemyAPI/Bluemix.
Ruby client library for the Wit.ai Language Understanding Platform.
Ruby client library for Wortschatz Leipzig web services.
Sentiment Analysis, Topic Modelling, Language Detection, Named Entity Recognition via a Ruby based Web API client.
Google's Natural Language service API for Ruby.
Language Identification
Segmentation
Simple multilingual tokenizer.
Multilingual tokenizer to split a string into tokens.
Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
Simple and customizable text tokenization library.
Word Boundary Disambiguation with many cookies.
Pure Ruby implementation of the Punkt Segmenter.
RegExp based tokenizer for different languages.
Sentence Boundary Disambiguation tool.
Stemming
Lexical Statistics: Counting Types and Tokens
Filtering Stop Words
Phrasal Level Processing
Constituency Parsing
Semantic Analysis
Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').
Calculates edit distance using the Damerau-Levenshtein algorithm.
Fast Ruby FFI string edit distance algorithms.
Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
Term Frequency / Inverse Document Frequency in pure Ruby.
Calculate the similarity between texts using TF/IDF.
Pragmatical Analysis
Projects and Code Examples
High Level Tasks
Spelling and Grammar corrections via the Ginger API.
Ruby bindings to the standard Hunspell Spell Checker.
FFI based Ruby bindings for Hunspell.
Ruby bindings to Hunspell via Ruby C API.
Alignment routines for bilingual texts (Gale-Church implementation).
Ruby client for the microsoft translator API.
Google Translate with speech synthesis in your terminal.
implementation of BLEU and other base algorithms.
Semantic Polarity based on the SentiWS lexicon.
Pure Ruby natural language date parser.
Simple Ruby natural language parser for date and time ranges.
Pure Ruby parser for elapsed time.
Methods for parsing and formatting human readable dates.
Extracts date, time, and message information from naturally worded text.
Parser for recurring and repeating events.
Ruby parser for English number expressions.
Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.
Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.
Text-to-Speech conversion using the Google translate service.
Ruby wrapper over the AT&T Speech API for speech to text.
Pocketsphinx bindings.
Full Text Search, Information Retrieval, Indexing
Ruby API library for Google services.
Ruby and Rails client library for Apache Solr .
Rails centric client for Apache Solr .
[Active Record] plugin for using Sphinx in (not only) Rails based projects.
Ruby and Rails integrations for Elasticsearch .
Dialog Agents, Assistants, and Chatbots
Linguistic Resources
Machine Learning Libraries
Support Vector Machines with Ruby.
JRuby bindings for Weka, different ML algorithms implemented through Weka.
Decision Tree ID3 Algorithm in pure Ruby .
Memory based learners from the Timbl framework.
General classifier module to allow Bayesian and other types of classifications.
Ruby implementation of the (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.
Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).
Redis-backed Bayesian classifier.
JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.
Simple Naive Bayes classifier.
Full-featured, Ruby implementation of Naive Bayes.
Generalized rack framework for text classifications.
Naive Bayes text classification implementation as an OmniCat classifier strategy.
Ruby bindings to the (Fast Artificial Neural Network Library (FANN)) .
Feature Extraction and Crossvalidation library.
Optical Character Recognition
Text Extraction
Language Aware String Manipulation
Fuzzy string comparison with Distance measures and Regular Expression.
Fuzzy string matching library for Ruby.
RoR ActiveSupport gem has various string extensions that can handle case.
Toolset for fuzzy searches in Ruby tuned for accuracy.
Unicode normalization library.
Find a lot of kinds of common information in a string.
Generate strings that match a given regular expression.
Make difficult regular expressions easy.
Transliterate Hebrew & Yiddish text into Latin characters.
hight-speed Regular Expression library for Text Mining and Text Extraction.
sample string generation from a given Regular Expression.
transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation).
transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation).
Articles, Posts, Talks, and Presentations
by Todd Schneider <sup>[video | code]</sup>
by Nathan Kleyn <sup>[tutorial | code]</sup>
by Gleicon Moraes <sup>[post | code]</sup>
Needs your Help!
Related Resources
Among other awesome items a short list of NLP related projects.
State-of-Art collection of Ruby libraries for NLP.
General List of NLP related resources (mostly not for Ruby programmers).
IRuby kernel for Jupyter (formelly IPython).
Multitude of OCR (Optical Character Recognition) resources.
Machine Learning with TensorFlow libraries.