Awesome NLP with Ruby — Speech and Natural Language Processing

NLP Pipeline Subtasks

Pipeline Generation

composable_operations 47 (archived)

Definition framework for operation pipelines.

ruby-spark 226 updated 8y ago

Spark bindings with an easy to understand DSL.

phobos 218 (archived)

Simplified Ruby Client for Apache Kafka.

parallel

Supervisor for parallel execution on multiple CPUs or in many threads.

pwrake 57 updated 6y ago

Rake extensions to run local and remote tasks in parallel.

Multipurpose Engines

open-nlp 91 updated 11mo ago

Ruby Bindings for the OpenNLP Toolkit.

stanford-core-nlp 436 updated 11mo ago

Ruby Bindings for the Stanford CoreNLP tools.

treat 1.4k updated 11mo ago

Natural Language Processing framework for Ruby (like NLTK for Python).

nlp_toolz 2 (archived)

Wrapper over some OpenNLP classes and the original Berkeley Parser.

open_nlp 11 updated 7y ago

JRuby Bindings for the OpenNLP Toolkit.

ruby-spacy 67 updated 1mo ago

Wrapper module for spaCy NLP library via PyCall.

alchemyapi_ruby 36 (archived)

Legacy Ruby SDK for AlchemyAPI/Bluemix.

wit-ruby 282 updated 3y ago

Ruby client library for the Wit.ai Language Understanding Platform.

wlapi 19 updated 3y ago

Ruby client library for Wortschatz Leipzig web services.

monkeylearn-ruby

Sentiment Analysis, Topic Modelling, Language Detection, Named Entity Recognition via a Ruby based Web API client.

google-cloud-language 1.4k updated 21d ago

Google's Natural Language service API for Ruby.

Language Identification

scylla 37 updated 3y ago

Language Categorization and Identification.

Segmentation

tokenizer 46 updated 9y ago

Simple multilingual tokenizer.

pragmatic_tokenizer 93 updated 1y ago

Multilingual tokenizer to split a string into tokens.

nlp-pure 20 (archived)

Natural language processing algorithms implemented in pure Ruby with minimal dependencies.

textoken 31 updated 4y ago

Simple and customizable text tokenization library.

pragmatic_segmenter

Word Boundary Disambiguation with many cookies.

punkt-segmenter 91 updated 7y ago

Pure Ruby implementation of the Punkt Segmenter.

tactful_tokenizer 80 updated 12y ago

RegExp based tokenizer for different languages.

scapel 52 updated 2mo ago

Sentence Boundary Disambiguation tool.

Stemming

ruby-stemmer 249 (archived)

Ruby-Stemmer exposes the SnowBall API to Ruby.

uea-stemmer 54 updated 1mo ago

Conservative stemmer for search and indexing.

Lemmatization

lemmatizer 112 updated 4y ago

WordNet based Lemmatizer for English texts.

Lexical Statistics: Counting Types and Tokens

wc 6 updated 14y ago

Facilities to count word occurrences in a text.

word_count 5 (archived)

Word counter for String and Hash objects.

Filtering Stop Words

stopwords-filter 80 updated 2y ago

Filter and Stop Word Lexicon based on the SnowBall lemmatizer.

Phrasal Level Processing

n_gram 37 updated 4y ago

N-Gram generator.

ruby-ngram 12 updated 12y ago

Break words and phrases into ngrams.

raingrams 70 updated 5y ago

Flexible and general-purpose ngrams library written in pure Ruby.

Constituency Parsing

rley 37 updated 1y ago

Pure Ruby implementation of the Earley Parsing Algorithm for Context-Free Constituency Grammars.

rsyntaxtree 119 updated 3mo ago

Visualization for syntactic trees in Ruby based on RMagick.

Semantic Analysis

amatch 390 updated 3mo ago

Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').

damerau-levenshtein 150 updated 1y ago

Calculates edit distance using the Damerau-Levenshtein algorithm.

hotwater 80 updated 13y ago

Fast Ruby FFI string edit distance algorithms.

levenshtein-ffi 151 updated 1y ago

Fast string edit distance computation, using the Damerau-Levenshtein algorithm.

tf_idf

Term Frequency / Inverse Document Frequency in pure Ruby.

tf-idf-similarity 775 updated 2y ago

Calculate the similarity between texts using TF/IDF.

Pragmatical Analysis

SentimentLib 14 updated 13y ago

Simple extensible sentiment analysis gem.

Projects and Code Examples

words_counted 164 updated 4y ago

Named entity recognition with Stanford NER and Ruby 19 updated 3y ago

NER Examples in Ruby and Java with some explanations.

Going the Distance 60 updated 9y ago

Implementations of various distance algorithms with example calculations.

High Level Tasks

gingerice 477 (archived)

Spelling and Grammar corrections via the Ginger API.

hunspell-i18n 4 updated 13y ago

Ruby bindings to the standard Hunspell Spell Checker.

ffi-hunspell 49 updated 2y ago

FFI based Ruby bindings for Hunspell.

hunspell 35 updated 10mo ago

Ruby bindings to Hunspell via Ruby C API.

alignment 1 updated 12y ago

Alignment routines for bilingual texts (Gale-Church implementation).

microsoft_translator

Ruby client for the microsoft translator API.

termit 507 (archived)

Google Translate with speech synthesis in your terminal.

zipf 4 updated 10y ago

implementation of BLEU and other base algorithms.

stimmung 20 updated 10y ago

Semantic Polarity based on the SentiWS lexicon.

chronic 3.3k updated 2y ago

Pure Ruby natural language date parser.

chronic_between

Simple Ruby natural language parser for date and time ranges.

chronic_duration 357 updated 11mo ago

Pure Ruby parser for elapsed time.

kronic 149 updated 11y ago

Methods for parsing and formatting human readable dates.

nickel 119 updated 8y ago

Extracts date, time, and message information from naturally worded text.

tickle

Parser for recurring and repeating events.

numerizer 38 updated 3y ago

Ruby parser for English number expressions.

ruby-nlp 92 updated 11y ago

Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.

espeak-ruby 197 updated 1mo ago

Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.

tts 94 updated 3y ago

Text-to-Speech conversion using the Google translate service.

att_speech 20 updated 12y ago

Ruby wrapper over the AT&T Speech API for speech to text.

pocketsphinx-ruby

Pocketsphinx bindings.

Full Text Search, Information Retrieval, Indexing

google-api-client 2.9k updated 23d ago

Ruby API library for Google services.

rsolr

Ruby and Rails client library for Apache Solr .

sunspot 3.0k updated 9mo ago

Rails centric client for Apache Solr .

thinking-sphinx 1.6k updated 3mo ago

[Active Record] plugin for using Sphinx in (not only) Rails based projects.

elasticsearch-rails 3.1k updated 6mo ago

Ruby and Rails integrations for Elasticsearch .

elasticsearch

elasticsearch 2.0k updated 1mo ago

Ruby client and API for Elasticsearch .

Dialog Agents, Assistants, and Chatbots

chatterbot

Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.

lita 1.7k (archived)

Highly extensible chat operation bot framework written with persistent storage on Redis.

Linguistic Resources

rwordnet 91 updated 6y ago

Pure Ruby self contained API library for the Princeton WordNet.

wordnet 139 updated 2y ago

Performance tuned bindings for the Princeton WordNet.

Machine Learning Libraries

rb-libsvm 279 updated 2y ago

Support Vector Machines with Ruby.

weka

JRuby bindings for Weka, different ML algorithms implemented through Weka.

decisiontree 1.5k updated 7y ago

Decision Tree ID3 Algorithm in pure Ruby .

rtimbl 5 updated 16y ago

Memory based learners from the Timbl framework.

classifier-reborn 556 updated 1y ago

General classifier module to allow Bayesian and other types of classifications.

lda-ruby 134 updated 1mo ago

Ruby implementation of the (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.

liblinear-ruby-swig 83 updated 2y ago

Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).

linnaeus 37 updated 10y ago

Redis-backed Bayesian classifier.

maxent_string_classifier

JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.

naive_bayes 49 updated 14y ago

Simple Naive Bayes classifier.

nbayes 154 updated 2y ago

Full-featured, Ruby implementation of Naive Bayes.

omnicat 11 updated 5y ago

Generalized rack framework for text classifications.

omnicat-bayes 31 updated 5y ago

Naive Bayes text classification implementation as an OmniCat classifier strategy.

ruby-fann 506 updated 2y ago

Ruby bindings to the (Fast Artificial Neural Network Library (FANN)) .

rblearn 2 updated 9y ago

Feature Extraction and Crossvalidation library.

Optical Character Recognition

tesseract-ocr 630 updated 8y ago

FFI based wrapper over the Tesseract OCR Engine .

Text Extraction

yomu 503 updated 3y ago

library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.

Language Aware String Manipulation

fuzzy_match 684 updated 4y ago

Fuzzy string comparison with Distance measures and Regular Expression.

fuzzy-string-match 287 updated 6y ago

Fuzzy string matching library for Ruby.

active_support 58.3k updated 21d ago

RoR ActiveSupport gem has various string extensions that can handle case.

fuzzy_tools 23 updated 5mo ago

Toolset for fuzzy searches in Ruby tuned for accuracy.

unicode 80 updated 1y ago

Unicode normalization library.

CommonRegexRuby 80 updated 4y ago

Find a lot of kinds of common information in a string.

regexp-examples 522 updated 1y ago

Generate strings that match a given regular expression.

verbal_expressions 570 updated 3y ago

Make difficult regular expressions easy.

translit_kit 7 updated 3y ago

Transliterate Hebrew & Yiddish text into Latin characters.

re2

hight-speed Regular Expression library for Text Mining and Text Extraction.

regex_sample

sample string generation from a given Regular Expression.

iuliia 10 updated 4y ago

transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation).

iuliia 72 (archived)

transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation).

Articles, Posts, Talks, and Presentations

2019

Extracting Text From Images Using Ruby

by aonemd

Demystifying Data Science: Analyzing Conference Talks with Rails and Ngrams

by Todd Schneider [video | code]

Natural Language Processing with Ruby: n-grams 33 updated 12y ago

by Nathan Kleyn [tutorial | code]

Practical text classification with Ruby 10 updated 16y ago

by Gleicon Moraes [post | code]

Needs your Help!

ferret 280 updated 3y ago

Information Retrieval in C and Ruby.

summarize 204 updated 14y ago

Ruby native wrapper for Open Text Summarizer.

Neural Machine Translation Implementations 364 updated 3y ago

Awesome Ruby 14.1k updated 27d ago

Among other awesome items a short list of NLP related projects.

Ruby NLP 1.3k updated 3y ago

State-of-Art collection of Ruby libraries for NLP.

Speech and Natural Language Processing 2.2k updated 7y ago

General List of NLP related resources (mostly not for Ruby programmers).

iRuby 924 updated 3mo ago

IRuby kernel for Jupyter (formelly IPython).

Awesome OCR 3.1k updated 1y ago

Multitude of OCR (Optical Character Recognition) resources.

Awesome TensorFlow 17.7k updated 2mo ago

Machine Learning with TensorFlow libraries.

License

ds-with-ruby 724 updated 2y ago

ml-with-ruby 2.2k updated 1y ago

change-pr 263 updated 1y ago

Speech and Natural Language Processing > NLP with Ruby

Contents

NLP Pipeline Subtasks

Pipeline Generation

Multipurpose Engines

Language Identification

Segmentation

Stemming

Lemmatization

Lexical Statistics: Counting Types and Tokens

Filtering Stop Words

Phrasal Level Processing

Constituency Parsing

Semantic Analysis

Pragmatical Analysis

Projects and Code Examples

High Level Tasks

Full Text Search, Information Retrieval, Indexing

elasticsearch

Dialog Agents, Assistants, and Chatbots

Linguistic Resources

Machine Learning Libraries

Optical Character Recognition

Text Extraction

Language Aware String Manipulation

Articles, Posts, Talks, and Presentations

2019

Needs your Help!

License

Speech and Natural Language Processing > NLP with Ruby

Contents

NLP Pipeline Subtasks

Pipeline Generation

Multipurpose Engines

Language Identification

Segmentation

Stemming

Lemmatization

Lexical Statistics: Counting Types and Tokens

Filtering Stop Words

Phrasal Level Processing

Constituency Parsing

Semantic Analysis

Pragmatical Analysis

Projects and Code Examples

High Level Tasks

Full Text Search, Information Retrieval, Indexing

elasticsearch

Dialog Agents, Assistants, and Chatbots

Linguistic Resources

Machine Learning Libraries

Optical Character Recognition

Text Extraction

Language Aware String Manipulation

Articles, Posts, Talks, and Presentations

2019

Needs your Help!

Related Resources

License