Test Automation Frameworks > ai-natural-language-tests
Generates Cypress and Playwright E2E tests from natural language requirements using LangGraph, ChromaDB, and multi-provider LLM support.
title: AI Natural Language Tests emoji: 🧪 colorFrom: blue colorTo: green sdk: gradio sdk_version: "6.12.0" python_version: "3.10" app_file: app.py pinned: false
AI-Powered E2E Test Generation Platform
Translate natural language requirements into production-ready end-to-end tests.
Enterprise-grade platform to generate and execute Cypress, Playwright, and WebdriverIO end-to-end tests from natural language requirements.
This project combines LLM-driven generation, LangGraph workflow orchestration, and vector-based pattern learning to improve test authoring speed while maintaining repeatability and CI/CD readiness.
Try It Live
ai-natural-language-tests on Hugging Face Spaces — Try the platform directly in your browser without installation.
Product Preview
Generate tests from plain-English requirements, inspect workflow logs, review structured output, and download the generated spec from a single interface.
Table of Contents
| Section | Links |
|---|---|
| Getting Started | Product Preview Overview Quick Start (5 Minutes) Business Value Core Capabilities |
| Platform Design | Architecture Workflow Technology Stack |
| Setup and Configuration | Repository Structure Prerequisites Installation GitHub Registry (GHCR) Configuration |
| Using the Platform | Usage CI/CD Integration |
| Operations | Security and Compliance Guidance Troubleshooting Compliance and Data Handling Operational Expectations Support Matrix |
| Project Info | Documentation Map Versioning and Release Policy Support and Security Reporting Changelog |
Overview
The platform translates natural language requirements into executable E2E tests for:
flowchart LR
A[AI-Powered E2E Test Generation]
A --> C[Cypress<br/>.cy.js<br/>Traditional and prompt-powered]
A --> P[Playwright<br/>.spec.ts<br/>TypeScript async/await]
A --> W[WebdriverIO<br/>.spec.js<br/>Mocha with Jest-like expect]
style A fill:#e3f2fd,color:#333333,stroke:#666666
style C fill:#c8e6c9,color:#333333,stroke:#666666
style P fill:#ffcdd2,color:#333333,stroke:#666666
style W fill:#ffe0b2,color:#333333,stroke:#666666
| Framework | Output | Style |
|---|---|---|
| Cypress | .cy.js |
Traditional & prompt-powered |
| Playwright | .spec.ts |
TypeScript async/await |
| WebdriverIO | .spec.js |
Mocha runner with Jest-like expect |
It supports both local engineering workflows and automated pipeline execution. The generator uses contextual data from live HTML analysis and historical pattern matching to produce stable, maintainable test assets.
Quick Start (5 Minutes)
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv
# Windows PowerShell: .\.venv\Scripts\Activate.ps1
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt
npm ci
npx playwright install chromium
Create .env and set at least one provider key:
OPENAI_API_KEY=your_key
Generate and run one Playwright test:
python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright --run
Business Value
[!NOTE]
- Reduces manual test authoring effort and onboarding time.
- Standardizes generated test structure across teams.
- Improves reuse through vector-based pattern memory.
- Supports enterprise delivery with CI/CD and Docker workflows.
- Enables faster root-cause diagnosis using AI-assisted failure analysis.
Core Capabilities
| Capability | Detail |
|---|---|
| Test Generation | Natural language to executable E2E test generation |
| Orchestration | LangGraph-based multi-step orchestration |
| URL Analysis | Dynamic URL analysis and fixture generation |
| Pattern Memory | Pattern storage and semantic retrieval using FAISS + SQLite |
| LLM Support | Multi-provider: OpenAI, Anthropic, Google |
| Cypress Modes | Traditional mode and Cypress prompt-powered mode |
| Playwright | TypeScript generation |
| WebdriverIO | JavaScript .spec.js generation with Mocha and Chrome runner support |
| HITL | Optional human approval gate with --approve |
| Replay | HTML snapshot replay with --list-html-replays and --replay-html-analysis |
| Execution | Optional immediate test execution after generation |
| Tracing | OpenTelemetry trace export to Grafana Tempo |
| Logging | Optional log shipping to Grafana Loki |
Architecture
graph TB
subgraph "User Input"
A[Natural Language<br/>Requirements]
B[URL/HTML Data<br/>--url flag]
C[CLI Requirement Text<br/>one or more prompts]
end
subgraph "AI & Workflow Engine"
D[LangGraph Workflow<br/>5-Step Process]
E[Multi-Provider LLM<br/>OpenAI / Anthropic / Google]
F[Vector Store<br/>Pattern Learning<br/>FAISS + SQLite]
end
subgraph "Framework Generation"
G{Cypress Framework}
H{Playwright Framework}
W{WebdriverIO Framework}
I[Cypress Tests<br/>.cy.js files<br/>Traditional & cy.prompt()]
J[Playwright Tests<br/>.spec.ts files<br/>TypeScript]
X[WebdriverIO Tests<br/>.spec.js files<br/>Mocha + expect]
end
subgraph "Execution & Analysis"
K[Cypress Runner<br/>npx cypress run]
L[Playwright Runner<br/>npx playwright test]
M[AI Failure Analyzer<br/>--analyze flag<br/>Multi-Provider LLM]
P[WebdriverIO Runner<br/>npx wdio run]
end
A --> D
B --> D
C --> D
D --> E
E --> F
F --> D
D --> G
D --> H
D --> W
G --> I
H --> J
W --> X
I --> K
J --> L
X --> P
K --> M
L --> M
P --> M
style D fill:#e3f2fd,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style F fill:#fff3e0,color:#333333,stroke:#666666
style G fill:#c8e6c9,color:#333333,stroke:#666666
style H fill:#ffcdd2,color:#333333,stroke:#666666
style W fill:#ffe0b2,color:#333333,stroke:#666666
High-Level Components
- CLI entrypoint (
qa_automation.py)- Parses arguments, selects mode, and orchestrates actions.
- Calls workflow
create_workflow()and handles result output.
- Configuration and prompts (
qa_config.py)- Defines framework metadata, LLM settings, and prompt loading utilities.
- Handles model provider fallback and YAML template parsing.
- Runtime services (
qa_runtime.py)- Logging/tracing setup (OpenTelemetry, Grafana Loki) and persistent objects.
- FAISS + SQLite pattern store lifecycle and query helpers.
- HTML analysis replay and failure analysis formatting.
- LangGraph workflow (
qa_workflow.py)- Defines
TestStateand step nodes (fetch, pattern search, generate, run). - Builds workflow graph with conditional transitions and checkpointer.
- Defines
- Observability layer (OpenTelemetry + Loki)
Workflow
flowchart TD
A[Start: User Input<br/>Requirements + Framework] --> C[Step 2: Fetch Test Data<br/>Analyze URL/HTML<br/>Extract Selectors<br/>Generate Fixtures]
C --> D[Step 3: Search Similar Patterns<br/>Query Vector Store<br/>Find Matching Test Patterns<br/>From Past Generations]
D --> E[Step 4: Generate Tests<br/>Use AI + Patterns<br/>Create Framework-Specific Code<br/>Cypress, Playwright, or WebdriverIO]
E --> H[HITL Approval<br/>Optional --approve before save]
H --> F[Step 5: Run Tests<br/>Execute via Framework Runner<br/>Optional --run flag]
F --> R[Replay Snapshot<br/>--list-html-replays / --replay-html-analysis]
R --> G[End: Tests Executed<br/>Ready for CI/CD]
style A fill:#e1f5fe,color:#333333,stroke:#666666
style C fill:#c8e6c9,color:#333333,stroke:#666666
style D fill:#ffcdd2,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style F fill:#e8f5e8,color:#333333,stroke:#666666
style G fill:#f3e5f5,color:#333333,stroke:#666666
Generation follows a deterministic five-step flow:
| Step | Name | Description |
|---|---|---|
| 2 | Fetch Test Data | Analyze URL/HTML, extract selectors, generate fixtures |
| 3 | Search Similar Patterns | Query vector store for matching historical patterns |
| 4 | Generate Tests | Use AI + patterns to create framework-specific code, optionally HITL-gated via --approve |
| 5 | Run Tests | Optionally execute via framework runner (--run) |
| Replay | Debug HTML Analysis | Replay stored HTML snapshots via CLI (--list-html-replays, --replay-html-analysis) |
Technology Stack
| Layer | Technology |
|---|---|
| Orchestration | Python CLI orchestration |
| Workflow | LangChain + LangGraph |
| Vector Store | FAISS + SQLite |
| LLM Backends | OpenAI / Anthropic / Google |
| Test Runners | Cypress, Playwright, and WebdriverIO runners |
| Observability | OpenTelemetry SDK and OTLP exporter |
| Logging | Loki logging handler (optional) |
Repository Structure
View repository tree
ai-natural-language-tests/
|-- cypress/
| |-- e2e/
| | |-- generated/
| | `-- prompt-powered/
| `-- fixtures/
|-- tests/
| `-- generated/
|-- webdriverio/
| `-- tests/
| `-- generated/
|-- prompt_specs/
|-- vector_db/
|-- qa_automation.py
|-- qa_config.py
|-- qa_runtime.py
|-- qa_workflow.py
|-- cypress.config.js
|-- playwright.config.ts
|-- wdio.conf.js
|-- package.json
|-- requirements.txt
|-- Dockerfile
|-- docker-compose.yml
`-- README.md
Prerequisites
| Requirement | Version / Notes |
|---|---|
| Python | 3.10+ |
| Node.js | 22+ |
| npm | Current stable release |
| Git | Current stable release |
| Playwright browsers | npx playwright install chromium |
Installation
Local Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv
# Windows PowerShell: .\.venv\Scripts\Activate.ps1
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt
npm ci
npx playwright install chromium
Create .env from .env.example, then set at least one provider key:
OPENAI_API_KEY=your_key
PowerShell quick set for current session:
$env:OPENAI_API_KEY = "your_key"
Optional: GitAgent (Repo-Specific)
This repository includes a targeted gitagent setup for its QA automation workflow:
agent.yaml(manifest)SOUL.mdandRULES.md(behavior and constraints)knowledge/(framework and repo references)
In short: agent.yaml defines the repo agent, SOUL.md and RULES.md define how it should behave, and knowledge/ gives it project-specific framework guidance.
Quick commands:
npm run gitagent:validate
npm run gitagent:info
npm run gitagent:export
Docker Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
docker compose build
Docker Compose loads .env and now explicitly forwards observability variables for Tempo and Loki to the container runtime.
Run in container:
docker compose run --rm test-generator "Test login" --url https://the-internet.herokuapp.com/login
Run with observability enabled:
docker compose run --rm test-generator \
"Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run
GitHub Registry (GHCR)
Pre-built Docker images are published to GitHub Container Registry. No local clone or build required.
| Without GHCR | With GHCR |
|---|---|
| Clone → install → build → run | docker run — done |
| Each user builds their own image | One image built once, shared everywhere |
| "Works on my machine" problems | Identical environment for every user |
Pull and run
docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest
docker run --rm \
-e OPENAI_API_KEY=your_key \
ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
"Test login" --url https://the-internet.herokuapp.com/login
Image tags
| Tag | Use case |
|---|---|
latest |
Always the most recently published version — use for quick runs |
v4.2.0 |
Pinned to a specific release — use in CI/CD for reproducibility |
For publishing and release management, see CONTRIBUTING.md.
Configuration
Core API Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key
OpenTelemetry (Grafana Tempo)
OTEL_PROVIDER=grafana
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-eu-north-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instance_id:api_token)>
Loki Logging (Optional)
GRAFANA_LOKI_URL=https://logs-prod-eu-north-0.grafana.net
GRAFANA_INSTANCE_ID=<instance_id>
GRAFANA_API_TOKEN=<logs_write_token>
[!TIP] Privacy-first setup:
- Use an LLM provider account/plan that guarantees no training or zero-retention for API data.
- Keep
OTEL_EXPORTER_OTLP_ENDPOINT,OTEL_EXPORTER_OTLP_HEADERS,GRAFANA_LOKI_URL,GRAFANA_INSTANCE_ID, andGRAFANA_API_TOKENunset to avoid telemetry/log shipping.- Use masked or synthetic test data for sensitive fields.
[!TIP] Need to support a new URL and tune prompts safely? Follow the step-by-step guide in PROMPT_UPDATE_GUIDE.md.
Usage
Quick Reference
| Mode | Command |
|---|---|
| Cypress (default) | python qa_automation.py "requirement" --url <url> |
| Playwright | python qa_automation.py "requirement" --url <url> --framework playwright |
| WebdriverIO | python qa_automation.py "requirement" --url <url> --framework webdriverio |
| Prompt-powered Cypress | python qa_automation.py "requirement" --url <url> --use-prompt |
| Generate + Execute | python qa_automation.py "requirement" --url <url> --run |
| Failure Analysis | python qa_automation.py --analyze "error message" |
| Pattern Inventory | python qa_automation.py --list-patterns |
[!TIP] If your global
pythonmisses project dependencies, run with the repository virtual environment:
- PowerShell:
.\.venv\Scripts\python.exe qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run- Bash:
./.venv/Scripts/python.exe qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run
[!NOTE] The current CLI supports URL-driven generation via
--url. A direct--dataJSON input flag is not implemented in this repository yet.
Natural Language Prompt Examples
| What you type | What AI generates |
|---|---|
"Test login with valid credentials" |
Login form fill + submit + success assertion |
"Test login fails with wrong password" |
Negative test with error message assertion |
"Test contact form submission" |
Form field detection + submit + confirmation |
"Test search returns results" |
Search input + trigger + results count assertion |
"Test signup with missing fields" |
Validation error coverage for required fields |
"Test logout clears session" |
Post-login logout + redirect assertion |
[!TIP] Writing effective AI requirements
- Be specific about the action: "Test login" vs "Test login with valid credentials and verify dashboard loads"
- Mention the expected outcome when it matters: "...and verify error message appears"
- Use
--urlto give the AI real page context — it reads the HTML and picks the right selectors automatically- Chain multiple requirements in one run:
"Test login" "Test logout" --url <url>
Generate Cypress Test
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login
Generate Playwright Test
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright
Generate WebdriverIO Test
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework webdriverio
Prompt-Powered Cypress Mode
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --use-prompt
Generate and Execute
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run
Failure Analysis
Show commands
python qa_automation.py --analyze "CypressError: Element not found"
python qa_automation.py --analyze -f error.log
[!NOTE] The AI failure analyzer returns a structured diagnosis:
Field Description CATEGORYError type: SELECTOR,TIMEOUT,ASSERTION,NETWORK, etc.REASONRoot cause explanation in plain English FIXSuggested code change or configuration fix
Pattern Inventory
Show command
python qa_automation.py --list-patterns
CI/CD Integration
flowchart TD
A[Code Changes<br/>Pushed to Repo] --> B[CI/CD Pipeline<br/>Triggers]
B --> C[Install Dependencies<br/>pip install -r requirements.txt<br/>npm install]
C --> D[Generate Tests<br/>python qa_automation.py<br/>--url]
D --> E[Run Tests<br/>npx cypress run<br/>npx playwright test<br/>npx wdio run]
E --> F{Tests Pass?}
F -->|Yes| G[Deploy Application<br/>Success]
F -->|No| H[AI Failure Analysis<br/>--analyze in pipeline]
H --> I[Auto-Fix & Regenerate<br/>If possible]
I --> E
H --> J[Notify Developers<br/>Manual intervention]
style A fill:#e1f5fe,color:#333333,stroke:#666666
style B fill:#fff3e0,color:#333333,stroke:#666666
style C fill:#c8e6c9,color:#333333,stroke:#666666
style D fill:#ffcdd2,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style G fill:#e8f5e8,color:#333333,stroke:#666666
style J fill:#ffebee,color:#333333,stroke:#666666
Recommended pipeline stages:
| Stage | Action |
|---|---|
| 1 | Install Python and Node dependencies |
| 2 | Validate environment variables and secrets injection |
| 3 | Generate tests from requirements |
| 4 | Execute generated tests |
| 5 | Publish artifacts and reports |
| 6 | Export telemetry to observability stack |
Security and Compliance Guidance
[!IMPORTANT]
- Store secrets only in secure secret managers (never commit
.env).- Use scoped API tokens with least-privilege access.
- Rotate provider keys and Grafana tokens on a fixed cadence.
- Keep generated tests and reports free of sensitive production data.
- Apply repository protection rules and mandatory CI checks.
Troubleshooting
[!WARNING] Traces Not Visible in Grafana Tempo
- Verify OTLP endpoint region and datasource selection.
- Verify
Authorization=Basic <base64(instance_id:api_token)>format.- Query with:
{resource.service.name="ai-natural-language-tests"}
[!NOTE] Loki Authentication Errors
- Ensure token has
logs:writescope.- Confirm instance ID and logs endpoint match the same Grafana stack.
[!TIP] Docker Observability Validation
- Confirm
.envincludes OTLP and Loki keys beforedocker compose run.- Use
docker compose configto verify environment interpolation.- In Grafana Explore, query Tempo with
service.name="ai-natural-language-tests".- In Grafana Loki, query labels:
{service_name="ai-natural-language-tests"}.
[!TIP] Switching to Headed Mode for Debugging
Tests run headless by default. To debug interactively, switch your framework config:
Cypress:
- Edit
cypress.config.jsand addheaded: trueafterbrowser: 'chrome'- Or run:
npx cypress run --headed --spec 'cypress/e2e/generated/*.cy.js'Playwright:
- Edit
playwright.config.tsand changeheadless: true→headless: false- Or run:
npx playwright test --headed tests/generated/WebdriverIO:
- Edit
wdio.conf.jsand comment out'--headless=new'from the args arrayDocker Headed Mode (with X11 forwarding):
docker build --target debug -t ai-tests:debug . docker run -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix ai-tests:debug
- Optional: mainly for Linux visual debugging.
- Retry with generated single-spec command from logs.
Documentation Map
| Document | Purpose |
|---|---|
README.md |
Platform overview, setup, usage, and operations |
CONTRIBUTING.md |
Contribution standards, review checks, and branch/PR flow |
CHANGELOG.md |
Release history and notable changes |
PROMPT_UPDATE_GUIDE.md |
Prompt and URL-tuning workflow |
RULES.md |
Repository automation and behavior constraints |
Versioning and Release Policy
| Policy Area | Guidance |
|---|---|
| Release model | Changelog-driven, documented in CHANGELOG.md |
| Production pinning | Prefer version tags such as v4.2.0 instead of latest |
latest usage |
Use for local exploration, not for controlled CI/CD |
| Upgrade notes | Breaking changes and upgrade guidance are captured per release |
Support and Security Reporting
| Topic | Recommended Action |
|---|---|
| Usage and feature requests | Open a GitHub issue with reproduction steps and environment details |
| Vulnerability reporting | Avoid public exploit details; share minimal impact + repro details privately |
| Exposed credentials | Revoke and rotate tokens before sharing logs or artifacts |
Compliance and Data Handling
| Control Area | Guidance |
|---|---|
| Data minimization | Use synthetic or masked data in prompts, fixtures, and generated tests |
| Secret hygiene | Keep keys in secret managers; never commit secrets |
| Telemetry control | Keep OpenTelemetry and Loki export optional and environment-driven |
| Access control | Use least-privilege tokens for providers and observability |
| Auditability | Use pinned image tags and changelog-referenced releases |
For implementation details and contribution controls, see CONTRIBUTING.md.
Operational Expectations
These are practical runbook-style expectations for delivery teams. They are operational targets, not contractual SLAs.
| Area | Target | Notes |
|---|---|---|
| Deterministic generation flow | Stable multi-step workflow execution | Uses fixed workflow stages with optional HITL gate |
| CI pipeline repeatability | Reproducible runs with pinned dependencies | Prefer pinned Docker/image tags and locked dependency files |
| Failure triage | Fast first-pass diagnosis | Use --analyze output for CATEGORY, REASON, and FIX guidance |
| Incident containment | Rapid credential isolation | Revoke and rotate provider/observability tokens if exposed |
Support Matrix
The matrix below reflects currently configured and documented project baselines.
|
|
Changelog
Release notes are maintained in CHANGELOG.md, following the Keep a Changelog format.
Production-focused AI-assisted E2E test generation for modern QA teams.
| © 2026 AI Quality Lab / Sreekanth Harigovindan. | tests.aiqualitylab.org |