Test Automation Frameworks > ai-natural-language-tests

Generates Cypress and Playwright E2E tests from natural language requirements using LangGraph, ChromaDB, and multi-provider LLM support.

Package 15 stars GitHub

title: AI Natural Language Tests emoji: 🧪 colorFrom: blue colorTo: green sdk: gradio sdk_version: "6.12.0" python_version: "3.10" app_file: app.py pinned: false

AI-Powered E2E Test Generation Platform

Translate natural language requirements into production-ready end-to-end tests.

Enterprise-grade platform to generate and execute Cypress, Playwright, and WebdriverIO end-to-end tests from natural language requirements.

This project combines LLM-driven generation, LangGraph workflow orchestration, and vector-based pattern learning to improve test authoring speed while maintaining repeatability and CI/CD readiness.

Try It Live

ai-natural-language-tests on Hugging Face Spaces — Try the platform directly in your browser without installation.

Product Preview

Generate Tests workflow preview

Generate tests from plain-English requirements, inspect workflow logs, review structured output, and download the generated spec from a single interface.

Section	Links
Getting Started	Product Preview Overview Quick Start (5 Minutes) Business Value Core Capabilities
Platform Design	Architecture Workflow Technology Stack
Setup and Configuration	Repository Structure Prerequisites Installation GitHub Registry (GHCR) Configuration
Using the Platform	Usage CI/CD Integration
Operations	Security and Compliance Guidance Troubleshooting Compliance and Data Handling Operational Expectations Support Matrix
Project Info	Documentation Map Versioning and Release Policy Support and Security Reporting Changelog

Overview

The platform translates natural language requirements into executable E2E tests for:

flowchart LR
  A[AI-Powered E2E Test Generation]
  A --> C[Cypress<br/>.cy.js<br/>Traditional and prompt-powered]
  A --> P[Playwright<br/>.spec.ts<br/>TypeScript async/await]
  A --> W[WebdriverIO<br/>.spec.js<br/>Mocha with Jest-like expect]

  style A fill:#e3f2fd,color:#333333,stroke:#666666
  style C fill:#c8e6c9,color:#333333,stroke:#666666
  style P fill:#ffcdd2,color:#333333,stroke:#666666
  style W fill:#ffe0b2,color:#333333,stroke:#666666

Framework	Output	Style
Cypress	`.cy.js`	Traditional & prompt-powered
Playwright	`.spec.ts`	TypeScript async/await
WebdriverIO	`.spec.js`	Mocha runner with Jest-like `expect`

It supports both local engineering workflows and automated pipeline execution. The generator uses contextual data from live HTML analysis and historical pattern matching to produce stable, maintainable test assets.

Quick Start (5 Minutes)

git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv
# Windows PowerShell: .\.venv\Scripts\Activate.ps1
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt
npm ci
npx playwright install chromium

Create .env and set at least one provider key:

OPENAI_API_KEY=your_key

Generate and run one Playwright test:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright --run

Business Value

[!NOTE]

Reduces manual test authoring effort and onboarding time.

Standardizes generated test structure across teams.

Improves reuse through vector-based pattern memory.

Supports enterprise delivery with CI/CD and Docker workflows.

Enables faster root-cause diagnosis using AI-assisted failure analysis.

Core Capabilities

Capability	Detail
Test Generation	Natural language to executable E2E test generation
Orchestration	LangGraph-based multi-step orchestration
URL Analysis	Dynamic URL analysis and fixture generation
Pattern Memory	Pattern storage and semantic retrieval using FAISS + SQLite
LLM Support	Multi-provider: OpenAI, Anthropic, Google
Cypress Modes	Traditional mode and Cypress prompt-powered mode
Playwright	TypeScript generation
WebdriverIO	JavaScript `.spec.js` generation with Mocha and Chrome runner support
HITL	Optional human approval gate with `--approve`
Replay	HTML snapshot replay with `--list-html-replays` and `--replay-html-analysis`
Execution	Optional immediate test execution after generation
Tracing	OpenTelemetry trace export to Grafana Tempo
Logging	Optional log shipping to Grafana Loki

Architecture

graph TB
    subgraph "User Input"
        A[Natural Language<br/>Requirements]
        B[URL/HTML Data<br/>--url flag]
    C[CLI Requirement Text<br/>one or more prompts]
    end

    subgraph "AI & Workflow Engine"
        D[LangGraph Workflow<br/>5-Step Process]
        E[Multi-Provider LLM<br/>OpenAI / Anthropic / Google]
        F[Vector Store<br/>Pattern Learning<br/>FAISS + SQLite]
    end

    subgraph "Framework Generation"
        G{Cypress Framework}
        H{Playwright Framework}
        W{WebdriverIO Framework}
        I[Cypress Tests<br/>.cy.js files<br/>Traditional & cy.prompt&#40;&#41;]
        J[Playwright Tests<br/>.spec.ts files<br/>TypeScript]
        X[WebdriverIO Tests<br/>.spec.js files<br/>Mocha + expect]
    end

    subgraph "Execution & Analysis"
        K[Cypress Runner<br/>npx cypress run]
        L[Playwright Runner<br/>npx playwright test]
        M[AI Failure Analyzer<br/>--analyze flag<br/>Multi-Provider LLM]
        P[WebdriverIO Runner<br/>npx wdio run]
    end

    A --> D
    B --> D
    C --> D
    D --> E
    E --> F
    F --> D
    D --> G
    D --> H
    D --> W
    G --> I
    H --> J
    W --> X
    I --> K
    J --> L
    X --> P
    K --> M
    L --> M
    P --> M

    style D fill:#e3f2fd,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style F fill:#fff3e0,color:#333333,stroke:#666666
    style G fill:#c8e6c9,color:#333333,stroke:#666666
    style H fill:#ffcdd2,color:#333333,stroke:#666666
    style W fill:#ffe0b2,color:#333333,stroke:#666666

High-Level Components

CLI entrypoint (qa_automation.py)
- Parses arguments, selects mode, and orchestrates actions.
- Calls workflow create_workflow() and handles result output.
Configuration and prompts (qa_config.py)
- Defines framework metadata, LLM settings, and prompt loading utilities.
- Handles model provider fallback and YAML template parsing.
Runtime services (qa_runtime.py)
- Logging/tracing setup (OpenTelemetry, Grafana Loki) and persistent objects.
- FAISS + SQLite pattern store lifecycle and query helpers.
- HTML analysis replay and failure analysis formatting.
LangGraph workflow (qa_workflow.py)
- Defines TestState and step nodes (fetch, pattern search, generate, run).
- Builds workflow graph with conditional transitions and checkpointer.
Observability layer (OpenTelemetry + Loki)

Workflow

flowchart TD
    A[Start: User Input<br/>Requirements + Framework] --> C[Step 2: Fetch Test Data<br/>Analyze URL/HTML<br/>Extract Selectors<br/>Generate Fixtures]
    C --> D[Step 3: Search Similar Patterns<br/>Query Vector Store<br/>Find Matching Test Patterns<br/>From Past Generations]
    D --> E[Step 4: Generate Tests<br/>Use AI + Patterns<br/>Create Framework-Specific Code<br/>Cypress, Playwright, or WebdriverIO]
    E --> H[HITL Approval<br/>Optional --approve before save]
    H --> F[Step 5: Run Tests<br/>Execute via Framework Runner<br/>Optional --run flag]
    F --> R[Replay Snapshot<br/>--list-html-replays / --replay-html-analysis]
    R --> G[End: Tests Executed<br/>Ready for CI/CD]

    style A fill:#e1f5fe,color:#333333,stroke:#666666
    style C fill:#c8e6c9,color:#333333,stroke:#666666
    style D fill:#ffcdd2,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style F fill:#e8f5e8,color:#333333,stroke:#666666
    style G fill:#f3e5f5,color:#333333,stroke:#666666

Generation follows a deterministic five-step flow:

Step	Name	Description
2	Fetch Test Data	Analyze URL/HTML, extract selectors, generate fixtures
3	Search Similar Patterns	Query vector store for matching historical patterns
4	Generate Tests	Use AI + patterns to create framework-specific code, optionally HITL-gated via `--approve`
5	Run Tests	Optionally execute via framework runner (`--run`)
Replay	Debug HTML Analysis	Replay stored HTML snapshots via CLI (`--list-html-replays`, `--replay-html-analysis`)

Technology Stack

Layer	Technology
Orchestration	Python CLI orchestration
Workflow	LangChain + LangGraph
Vector Store	FAISS + SQLite
LLM Backends	OpenAI / Anthropic / Google
Test Runners	Cypress, Playwright, and WebdriverIO runners
Observability	OpenTelemetry SDK and OTLP exporter
Logging	Loki logging handler (optional)

Repository Structure

View repository tree

ai-natural-language-tests/
|-- cypress/
|   |-- e2e/
|   |   |-- generated/
|   |   `-- prompt-powered/
|   `-- fixtures/
|-- tests/
|   `-- generated/
|-- webdriverio/
|   `-- tests/
|       `-- generated/
|-- prompt_specs/
|-- vector_db/
|-- qa_automation.py
|-- qa_config.py
|-- qa_runtime.py
|-- qa_workflow.py
|-- cypress.config.js
|-- playwright.config.ts
|-- wdio.conf.js
|-- package.json
|-- requirements.txt
|-- Dockerfile
|-- docker-compose.yml
`-- README.md

Prerequisites

Requirement	Version / Notes
Python	3.10+
Node.js	22+
npm	Current stable release
Git	Current stable release
Playwright browsers	`npx playwright install chromium`

Installation

Local Setup

git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv
# Windows PowerShell: .\.venv\Scripts\Activate.ps1
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt
npm ci
npx playwright install chromium

Create .env from .env.example, then set at least one provider key:

OPENAI_API_KEY=your_key

PowerShell quick set for current session:

$env:OPENAI_API_KEY = "your_key"

Optional: GitAgent (Repo-Specific)

This repository includes a targeted gitagent setup for its QA automation workflow:

agent.yaml (manifest)
SOUL.md and RULES.md (behavior and constraints)
knowledge/ (framework and repo references)

In short: agent.yaml defines the repo agent, SOUL.md and RULES.md define how it should behave, and knowledge/ gives it project-specific framework guidance.

Quick commands:

npm run gitagent:validate
npm run gitagent:info
npm run gitagent:export

Docker Setup

git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
docker compose build

Docker Compose loads .env and now explicitly forwards observability variables for Tempo and Loki to the container runtime.

Run in container:

docker compose run --rm test-generator "Test login" --url https://the-internet.herokuapp.com/login

Run with observability enabled:

docker compose run --rm test-generator \
  "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

GitHub Registry (GHCR)

Pre-built Docker images are published to GitHub Container Registry. No local clone or build required.

Without GHCR	With GHCR
Clone → install → build → run	`docker run` — done
Each user builds their own image	One image built once, shared everywhere
"Works on my machine" problems	Identical environment for every user

Pull and run

docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest

docker run --rm \
  -e OPENAI_API_KEY=your_key \
  ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
  "Test login" --url https://the-internet.herokuapp.com/login

Image tags

Tag	Use case
`latest`	Always the most recently published version — use for quick runs
`v4.2.0`	Pinned to a specific release — use in CI/CD for reproducibility

For publishing and release management, see CONTRIBUTING.md.

Configuration

Core API Keys

OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key

OpenTelemetry (Grafana Tempo)

OTEL_PROVIDER=grafana
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-eu-north-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instance_id:api_token)>

Loki Logging (Optional)

GRAFANA_LOKI_URL=https://logs-prod-eu-north-0.grafana.net
GRAFANA_INSTANCE_ID=<instance_id>
GRAFANA_API_TOKEN=<logs_write_token>

[!TIP] Privacy-first setup:

Use an LLM provider account/plan that guarantees no training or zero-retention for API data.

Keep OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GRAFANA_LOKI_URL, GRAFANA_INSTANCE_ID, and GRAFANA_API_TOKEN unset to avoid telemetry/log shipping.

Use masked or synthetic test data for sensitive fields.

[!TIP] Need to support a new URL and tune prompts safely? Follow the step-by-step guide in PROMPT_UPDATE_GUIDE.md.

Usage

Quick Reference

Mode	Command
Cypress (default)	`python qa_automation.py "requirement" --url <url>`
Playwright	`python qa_automation.py "requirement" --url <url> --framework playwright`
WebdriverIO	`python qa_automation.py "requirement" --url <url> --framework webdriverio`
Prompt-powered Cypress	`python qa_automation.py "requirement" --url <url> --use-prompt`
Generate + Execute	`python qa_automation.py "requirement" --url <url> --run`
Failure Analysis	`python qa_automation.py --analyze "error message"`
Pattern Inventory	`python qa_automation.py --list-patterns`

[!TIP] If your global python misses project dependencies, run with the repository virtual environment:

PowerShell: .\.venv\Scripts\python.exe qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

Bash: ./.venv/Scripts/python.exe qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

[!NOTE] The current CLI supports URL-driven generation via --url. A direct --data JSON input flag is not implemented in this repository yet.

Natural Language Prompt Examples

What you type	What AI generates
`"Test login with valid credentials"`	Login form fill + submit + success assertion
`"Test login fails with wrong password"`	Negative test with error message assertion
`"Test contact form submission"`	Form field detection + submit + confirmation
`"Test search returns results"`	Search input + trigger + results count assertion
`"Test signup with missing fields"`	Validation error coverage for required fields
`"Test logout clears session"`	Post-login logout + redirect assertion

[!TIP] Writing effective AI requirements

Be specific about the action: "Test login" vs "Test login with valid credentials and verify dashboard loads"

Mention the expected outcome when it matters: "...and verify error message appears"

Use --url to give the AI real page context — it reads the HTML and picks the right selectors automatically

Chain multiple requirements in one run: "Test login" "Test logout" --url <url>

Generate Cypress Test

Show command

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login

Generate Playwright Test

Show command

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright

Generate WebdriverIO Test

Show command

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework webdriverio

Prompt-Powered Cypress Mode

Show command

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --use-prompt

Generate and Execute

Show command

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

Failure Analysis

Show commands

python qa_automation.py --analyze "CypressError: Element not found"
python qa_automation.py --analyze -f error.log

[!NOTE] The AI failure analyzer returns a structured diagnosis:

Field Description

CATEGORY Error type: SELECTOR, TIMEOUT, ASSERTION, NETWORK, etc.

REASON Root cause explanation in plain English

FIX Suggested code change or configuration fix

Field	Description
`CATEGORY`	Error type: `SELECTOR`, `TIMEOUT`, `ASSERTION`, `NETWORK`, etc.
`REASON`	Root cause explanation in plain English
`FIX`	Suggested code change or configuration fix

Pattern Inventory

Show command

python qa_automation.py --list-patterns

CI/CD Integration

flowchart TD
    A[Code Changes<br/>Pushed to Repo] --> B[CI/CD Pipeline<br/>Triggers]
    B --> C[Install Dependencies<br/>pip install -r requirements.txt<br/>npm install]
    C --> D[Generate Tests<br/>python qa_automation.py<br/>--url]
    D --> E[Run Tests<br/>npx cypress run<br/>npx playwright test<br/>npx wdio run]
    E --> F{Tests Pass?}
    F -->|Yes| G[Deploy Application<br/>Success]
    F -->|No| H[AI Failure Analysis<br/>--analyze in pipeline]
    H --> I[Auto-Fix & Regenerate<br/>If possible]
    I --> E
    H --> J[Notify Developers<br/>Manual intervention]

    style A fill:#e1f5fe,color:#333333,stroke:#666666
    style B fill:#fff3e0,color:#333333,stroke:#666666
    style C fill:#c8e6c9,color:#333333,stroke:#666666
    style D fill:#ffcdd2,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style G fill:#e8f5e8,color:#333333,stroke:#666666
    style J fill:#ffebee,color:#333333,stroke:#666666

Recommended pipeline stages:

Stage	Action
1	Install Python and Node dependencies
2	Validate environment variables and secrets injection
3	Generate tests from requirements
4	Execute generated tests
5	Publish artifacts and reports
6	Export telemetry to observability stack

Security and Compliance Guidance

[!IMPORTANT]

Store secrets only in secure secret managers (never commit .env).

Use scoped API tokens with least-privilege access.

Rotate provider keys and Grafana tokens on a fixed cadence.

Keep generated tests and reports free of sensitive production data.

Apply repository protection rules and mandatory CI checks.

Troubleshooting

[!WARNING] Traces Not Visible in Grafana Tempo

Verify OTLP endpoint region and datasource selection.

Verify Authorization=Basic <base64(instance_id:api_token)> format.

Query with:
{resource.service.name="ai-natural-language-tests"}

[!NOTE] Loki Authentication Errors

Ensure token has logs:write scope.

Confirm instance ID and logs endpoint match the same Grafana stack.

[!TIP] Docker Observability Validation

Confirm .env includes OTLP and Loki keys before docker compose run.

Use docker compose config to verify environment interpolation.

In Grafana Explore, query Tempo with service.name="ai-natural-language-tests".

In Grafana Loki, query labels: {service_name="ai-natural-language-tests"}.

[!TIP] Switching to Headed Mode for Debugging

Tests run headless by default. To debug interactively, switch your framework config:

Cypress:

Edit cypress.config.js and add headed: true after browser: 'chrome'

Or run: npx cypress run --headed --spec 'cypress/e2e/generated/*.cy.js'

Playwright:

Edit playwright.config.ts and change headless: true → headless: false

Or run: npx playwright test --headed tests/generated/

WebdriverIO:

Edit wdio.conf.js and comment out '--headless=new' from the args array

Docker Headed Mode (with X11 forwarding):
docker build --target debug -t ai-tests:debug .
docker run -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix ai-tests:debug
Optional: mainly for Linux visual debugging.

Retry with generated single-spec command from logs.

Documentation Map

Document	Purpose
`README.md`	Platform overview, setup, usage, and operations
`CONTRIBUTING.md`	Contribution standards, review checks, and branch/PR flow
`CHANGELOG.md`	Release history and notable changes
`PROMPT_UPDATE_GUIDE.md`	Prompt and URL-tuning workflow
`RULES.md`	Repository automation and behavior constraints

Versioning and Release Policy

Policy Area	Guidance
Release model	Changelog-driven, documented in `CHANGELOG.md`
Production pinning	Prefer version tags such as `v4.2.0` instead of `latest`
`latest` usage	Use for local exploration, not for controlled CI/CD
Upgrade notes	Breaking changes and upgrade guidance are captured per release

Support and Security Reporting

Topic	Recommended Action
Usage and feature requests	Open a GitHub issue with reproduction steps and environment details
Vulnerability reporting	Avoid public exploit details; share minimal impact + repro details privately
Exposed credentials	Revoke and rotate tokens before sharing logs or artifacts

Compliance and Data Handling

Control Area	Guidance
Data minimization	Use synthetic or masked data in prompts, fixtures, and generated tests
Secret hygiene	Keep keys in secret managers; never commit secrets
Telemetry control	Keep OpenTelemetry and Loki export optional and environment-driven
Access control	Use least-privilege tokens for providers and observability
Auditability	Use pinned image tags and changelog-referenced releases

For implementation details and contribution controls, see CONTRIBUTING.md.

Operational Expectations

These are practical runbook-style expectations for delivery teams. They are operational targets, not contractual SLAs.

Area	Target	Notes
Deterministic generation flow	Stable multi-step workflow execution	Uses fixed workflow stages with optional HITL gate
CI pipeline repeatability	Reproducible runs with pinned dependencies	Prefer pinned Docker/image tags and locked dependency files
Failure triage	Fast first-pass diagnosis	Use --analyze output for CATEGORY, REASON, and FIX guidance
Incident containment	Rapid credential isolation	Revoke and rotate provider/observability tokens if exposed

Support Matrix

The matrix below reflects currently configured and documented project baselines.

Component	Baseline
Python	3.10+
Node.js	22+
Cypress	15.8.1+
Playwright	1.58.1+
WebdriverIO	8.46.0+
Chromedriver	145.0.6+

Environment Guidance	Recommendation
Shell compatibility	Use documented commands for both Windows PowerShell and Unix-like shells
Playwright runtime	Install Chromium browsers before first Playwright execution
Enterprise rollout	Pin versions and run smoke tests in each target environment

Changelog

Release notes are maintained in CHANGELOG.md, following the Keep a Changelog format.

Production-focused AI-assisted E2E test generation for modern QA teams.

_{tests.aiqualitylab.org}

Back to Testing

Test Automation Frameworks > ai-natural-language-tests

title: AI Natural Language Tests emoji: 🧪 colorFrom: blue colorTo: green sdk: gradio sdk_version: "6.12.0" python_version: "3.10" app_file: app.py pinned: false

AI-Powered E2E Test Generation Platform

Try It Live

Product Preview

Table of Contents

Overview

Quick Start (5 Minutes)

Business Value

Core Capabilities

Architecture

Workflow

Technology Stack

Repository Structure

Prerequisites

Installation

Optional: GitAgent (Repo-Specific)

GitHub Registry (GHCR)

Pull and run

Image tags

Configuration

Usage

Generate Cypress Test

Generate Playwright Test

Generate WebdriverIO Test

Prompt-Powered Cypress Mode

Generate and Execute

Failure Analysis

Pattern Inventory

CI/CD Integration

Security and Compliance Guidance

Troubleshooting

Documentation Map

Versioning and Release Policy

Support and Security Reporting

Compliance and Data Handling

Operational Expectations

Support Matrix

Changelog