Project Awesome project awesome

Content Extraction > trafilatura

A tool for gathering text and metadata from the web, with built-in content filtering.

Package 5.6k stars GitHub
Back to Python