aws-pdf-textract-pipeline

ETL pipeline for crawling PDFs from the Web using Puppeteer and transforming their contents into structured data using AWS Textract and storing the results in DynamoDB.

Package 166 stars GitHub

Back to CDK