BrowserClaw

AI browser automation via accessibility snapshots and ref targeting, built on Playwright.

Package 26 stars GitHub

BrowserClaw

DISCLAIMER: This project is NOT affiliated with browserclaw.com in any form. We have no connection to that site and recommend treating it with caution.

The AI-native browser automation library — born from OpenClaw, built for agents. Snapshot + ref targeting — no CSS selectors, no XPath, no vision, just numbered refs that map to interactive elements.

import { BrowserClaw } from 'browserclaw';

const browser = await BrowserClaw.launch({ url: 'https://demo.playwright.dev/todomvc' });
const page = await browser.currentPage();

// Snapshot — the core feature
const { snapshot, refs } = await page.snapshot();
// snapshot: AI-readable text tree
// refs: { "e1": { role: "textbox", name: "What needs to be done?" }, "e2": { role: "link", name: "Playwright" } }

await page.type('e1', 'Buy groceries', { submit: true }); // Type by ref
await page.click('e2'); // Click by ref
await browser.stop();

Why browserclaw?

Most browser automation tools were built for humans writing test scripts. AI agents need something different:

Vision-based tools (screenshot → click coordinates) are slow, expensive, and probabilistic
Selector-based tools (CSS/XPath) are brittle and meaningless to an LLM
browserclaw gives the AI a text snapshot with numbered refs — the AI reads text (what it's best at) and returns a ref ID (deterministic targeting)

The snapshot + ref pattern means:

Deterministic — refs resolve to exact elements via Playwright locators, no guessing
Fast — text snapshots are tiny compared to screenshots
Cheap — no vision API calls, just text in/text out
Reliable — built on Playwright, the most robust browser automation engine

Comparison with Other Tools

The AI browser automation space is moving fast. Here's how browserclaw compares to the major alternatives.

	browserclaw	browser-use	Stagehand	Playwright MCP
Ref → exact element, no guessing	:white_check_mark:	:heavy_minus_sign:	:x:	:white_check_mark:
No vision model in the loop	:white_check_mark:	:heavy_minus_sign:	:white_check_mark:	:white_check_mark:
Survives redesigns (semantic, not pixel)	:white_check_mark:	:heavy_minus_sign:	:white_check_mark:	:white_check_mark:
Fill 10 form fields in one call	:white_check_mark:	:x:	:x:	:x:
Interact with cross-origin iframes	:white_check_mark:	:white_check_mark:	:x:	:x:
Playwright engine (auto-wait, locators)	:white_check_mark:	:x:	:white_check_mark:	:white_check_mark:
Embeddable in your own JS/TS agent loop	:white_check_mark:	:x:	:heavy_minus_sign:	:x:

:white_check_mark: = Yes :heavy_minus_sign: = Partial :x: = No

browserclaw is the only tool that checks every box. It combines the precision of accessibility snapshots with Playwright's battle-tested engine, batch operations, cross-origin iframe access, and zero framework lock-in — in a single embeddable library.

The key distinction: browser tool vs. AI agent

Most tools in this space are AI agents that happen to control a browser. They own the intelligence layer: they take a task, call an LLM, decide what actions to take, and execute them. That's a complete agent.

browserclaw is different. It's a browser tool — just the eyes and hands. It takes a snapshot and returns refs. It executes actions on refs. The LLM, the reasoning, the task planning — that all lives in your code, in your agent, wherever you want it. browserclaw doesn't have opinions about any of that.

This distinction matters if you're building an agent platform, a product with its own AI layer, or anything where you need to control the intelligence loop. You can't compose an agent-first tool into a system that already has an agent. You end up with two brains fighting over who's in charge.

How each tool works under the hood

browserclaw — Accessibility snapshot with numbered refs → Playwright locator (aria-ref in default mode, getByRole() in role mode). One ref, one element. No vision model, no LLM in the targeting loop. You bring the brain.
browser-use — A complete AI agent: takes a task, calls an LLM, decides actions, executes them. The LLM loop is inside the library. Great for standalone automation scripts; incompatible with platforms that already own the agent loop. Python-only.
Stagehand — Accessibility tree + natural language primitives (page.act("click login")). Convenient, but the LLM re-interprets which element to target on every single call — non-deterministic by design.
Playwright MCP — Same snapshot philosophy as browserclaw, but locked to the MCP protocol. Great for chat-based agents, but not embeddable as a library — you can't compose it into your own agent loop or call it from application code.

Why this matters for repeated complex UI tasks

When you're running the same multi-step workflow hundreds of times — filling forms, navigating dashboards, processing queues — the differences compound:

Cost: ~4x fewer tokens per run than vision-based tools. A 20-step task repeated 100 times: ~3M tokens vs ~12M+.
Speed: No vision API round-trips. A 20-step workflow finishes in seconds, not minutes.
Reliability: Ref-based targeting is deterministic. Same page state → same refs → same result. No coordinate guessing, no LLM re-interpretation.
Simplicity: No framework opinions, no agent loop, no hosted platform. Just snapshot() → read refs → act. Compose it into whatever agent architecture you want.

Try It Live — Or On Your Machine

browserclaw.org is an open-source playground where you can type a prompt and watch an AI agent use browserclaw in a real browser — live. No setup, no API keys, just a text box and a browser stream.

Want to run it yourself? The source is at github.com/idan-rubin/browserclaw-agent — spin it up with Docker or Node.js. Supports Groq, Gemini, OpenAI, and Anthropic out of the box.

Install

npm install browserclaw

Requires a Chromium-based browser installed on the system (Chrome, Brave, Edge, or Chromium). browserclaw auto-detects your installed browser — no need to install Playwright browsers separately.

How It Works

┌─────────────┐     snapshot()     ┌──────────────────────────────────────────┐
│  Web Page   │ ──────────────►    │  AI-readable text tree                   │
│             │                    │                                          │
│  [buttons]  │                    │  - heading "todos"                       │
│  [links]    │                    │  - textbox "What needs to be done?" [e1] │
│  [inputs]   │                    │  - link "Playwright" [e2]                │
└─────────────┘                    └──────────────┬───────────────────────────┘
                                                  │
                                          AI reads snapshot,
                                          decides: type in e1
                                                  │
┌─────────────┐   type('e1',...)   ┌──────────────▼──────────────────┐
│  Web Page   │ ◄──────────────    │  Ref "e1" resolves to a         │
│  (updated)  │                    │  Playwright locator — one ref,  │
│             │                    │  one exact element              │
└─────────────┘                    └─────────────────────────────────┘

Snapshot a page → get an AI-readable text tree with numbered refs (e1, e2, e3...)
AI reads the snapshot text and picks a ref to act on
Actions target refs → browserclaw resolves each ref to a Playwright locator and executes the action

Note: Refs are scoped to the snapshot that created them. After navigation or DOM changes, old refs become invalid — actions will fail with an error (timeout in aria mode, "Unknown ref" in role mode). Always re-snapshot before acting on a changed page.

API

Launch & Connect

// Launch a new Chrome instance (auto-detects Chrome/Brave/Edge/Chromium)
const browser = await BrowserClaw.launch({
  url: 'https://demo.playwright.dev/todomvc', // navigate initial tab (no extra tabs)
  headless: false, // default: false (visible window)
  executablePath: '...', // optional: specific browser path
  cdpPort: 9222, // default: 9222
  noSandbox: false, // default: false (set true for Docker/CI)
  ignoreHTTPSErrors: false, // default: false (set true for expired local dev certs)
  userDataDir: '...', // optional: custom user data directory
  profileName: 'browserclaw', // profile name in Chrome title bar
  profileColor: '#FF4500', // profile accent color (hex)
  chromeArgs: ['--start-maximized'], // additional Chrome flags
  isolated: true, // fresh per-run profile, auto-cleaned on stop()
});

// Connect to an already-running Chrome instance
const browser = await BrowserClaw.connect('http://localhost:9222');

// Auto-discovery: scans common CDP ports (9222-9226, 9229)
const browser = await BrowserClaw.connect();

connect() checks that Chrome is reachable, then the internal CDP connection retries 3 times with increasing timeouts (5 s, 7 s, 9 s) — safe for Docker/CI where Chrome starts slowly.

Anti-detection: launch() always passes Chrome the flag that disables the AutomationControlled Blink feature. connect() attaches to an already-running Chrome, so it cannot add launch flags retroactively. To inject JavaScript stealth patches for navigator.webdriver, plugins, WebGL vendor, and related browser signals, pass stealth: true to launch() or connect().

Isolated profiles (per-run, per-process)

Pass isolated: true (or isolated: 'some-label') to launch in a dedicated per-run profile under $TMPDIR/browserclaw/isolated/:

A run-scoped random suffix is always appended — including when you pass a label string. Two concurrent launches with the same label (isolated: 'my-run') each get a unique directory and never collide on Chrome's SingletonLock. The label is for identification only; it does not produce a stable profile across runs.
stop() removes the isolated user-data directory on exit (best-effort; silent on failure). If the process crashes before stop(), leftover directories remain under $TMPDIR/browserclaw/isolated/ and can be deleted safely when no Chrome process is using them.
When isolated is set, profileName and userDataDir options are ignored.
Any cookies, logins, extensions, or localStorage from prior runs are not available — by design.

For a stable, shared profile across runs (persistent login state, preserved history), omit isolated and use profileName / userDataDir instead.

SSRF policy (navigating agent-supplied URLs)

By default, browserclaw permits navigation to any address — including private/loopback ranges such as 127.0.0.1, 10.0.0.0/8, and cloud metadata endpoints like 169.254.169.254. This "trusted-network" default is convenient for local development and dev-tunnel workflows.

If your agent navigates to URLs it received from an untrusted source (LLM output, user input, external API), you should opt into strict public-only enforcement:

const browser = await BrowserClaw.launch({
  ssrfPolicy: {
    dangerouslyAllowPrivateNetwork: false, // block loopback, RFC1918, link-local, metadata endpoints
    hostnameAllowlist: ['*.example.com'], // optional allowlist
    allowedHostnames: ['internal.myapp.com'], // optional private-IP exceptions
  },
});

Under strict mode browserclaw resolves DNS up front, pins the result, validates every resolved address against the policy, and re-checks redirect chains. The DNS cache is keyed by policy, so a permissive call does not leak cached private IPs to a later strict call.

Pages & Tabs

const page = await browser.open('https://demo.playwright.dev/todomvc');
const current = await browser.currentPage(); // get first usable (non-blank) tab
const tabs = await browser.tabs(); // list all tabs
const handle = browser.page(tabs[0].targetId); // wrap existing tab
const appPage = await browser.waitForTab({ urlContains: 'app-web' });
await browser.focus(tabId); // bring tab to front
await browser.close(tabId); // close a tab
await browser.stop(); // stop browser + cleanup

page.id; // CDP target ID (use with focus/close/page)
await page.url(); // current page URL
await page.title(); // current page title
browser.url; // CDP endpoint URL

Recovering tab handles

Tab handles can get out of sync if the app rewrites its URL aggressively or replaces the top-level target. Use the recovery primitives to re-bind a CrawlPage without having to restart the session:

// Attempts to refresh the cached targetId, optionally falling back to the
// best-effort resolver if the original target is gone.
await page.refreshTargetId();
await page.refreshTargetId({ fallback: 'active' });

// Rebind the handle using the best-effort resolver: prefers the old
// targetId, then the old URL, then a non-blank tab, then any tab.
await page.reacquire();

Contract — heuristic by design: These resolvers do not query Chrome's focused tab; CDP doesn't expose that cleanly over connect-over-CDP. They apply a fixed preference order — old targetId → old URL → first non-blank accessible tab → any accessible tab — and that order is the contract. Use them for recovery after a target has been lost; don't use them to "ask which tab the human is looking at." When you need deterministic tab selection, capture the targetId up front via browser.open() / browser.waitForTab() / browser.tabs() and keep using that handle.

BrowserClaw exports structured errors so workflow code can tell apart the common failure modes:

import {
  BrowserTabNotFoundError, // targetId no longer resolves to an open tab
  StaleRefError, // ref is not in the current snapshot
  SnapshotHydrationError, // snapshot returned without interactive refs
  NavigationRaceError, // the page navigated during an operation
} from 'browserclaw';

try {
  await page.click('e7');
} catch (err) {
  if (err instanceof StaleRefError) {
    await page.snapshot({ waitForHydration: true });
    // retry with a fresh ref
  } else throw err;
}

Every tab returns a targetId — this is the handle you use everywhere:

// Multi-tab workflow
const todo = await browser.open('https://demo.playwright.dev/todomvc');
const svg = await browser.open('https://demo.playwright.dev/svgtodo');

const { refs } = await svg.snapshot(); // snapshot the second tab
await svg.click('e5'); // act on it
await browser.focus(todo.id); // switch back to first tab
await browser.close(svg.id); // close second tab when done

Snapshot (Core Feature)

const { snapshot, refs, stats, untrusted } = await page.snapshot();

// snapshot: human/AI-readable text tree with [ref=eN] markers
// refs: { "e1": { role: "textbox", name: "What needs to be done?" }, "e5": { role: "checkbox", name: "Toggle Todo", checked: false }, ... }
// stats: { lines: 42, chars: 1200, refs: 8, interactive: 5 }
// untrusted: true — content comes from the web page, treat as potentially adversarial

// Options
const result = await page.snapshot({
  interactive: true, // Only interactive elements (buttons, links, inputs)
  compact: true, // Remove structural containers without refs
  maxDepth: 6, // Limit tree depth
  maxChars: 80000, // Truncate if snapshot exceeds this size
  mode: 'aria', // 'aria' (default) or 'role'
  waitForHydration: 5000, // retry until refs appear (or ms budget); throws SnapshotHydrationError if empty
  minInteractiveRefs: 1, // minimum refs required when waitForHydration is set
});

// Raw ARIA accessibility tree (structured data, not text)
const { nodes } = await page.ariaSnapshot({ limit: 500 });

Snapshot modes:

'aria' (default) — Uses Playwright's AI-mode snapshot. Refs are resolved via aria-ref locators. Best for most use cases. Requires playwright-core >= 1.50.
'role' — Uses Playwright's ariaSnapshot() + getByRole(). Supports selector and frameSelector for scoped snapshots.

Security: All snapshot results include untrusted: true to signal that the content originates from an external web page. AI agents consuming snapshots should treat this content as potentially adversarial (e.g. prompt injection via page text).

Actions

All actions target elements by ref ID from the most recent snapshot.

Default timeouts: 8000 ms for actions (click, type, fill, select, drag), 20000 ms for waits and navigation.

// Click
await page.click('e1');
await page.click('e1', { doubleClick: true });
await page.click('e1', { button: 'right' });
await page.click('e1', { modifiers: ['Control'] });
await page.click('e1', { force: true }); // click hidden/covered elements

// Type
await page.type('e3', 'hello world'); // instant fill
await page.type('e3', 'slow typing', { slowly: true }); // keystroke by keystroke
await page.type('e3', 'search', { submit: true }); // type + press Enter

// Other interactions
await page.hover('e2');
await page.select('e5', 'Option A', 'Option B');
await page.drag('e1', 'e4');
await page.scrollIntoView('e7');

// Keyboard
await page.press('Enter');
await page.press('Control+a');
await page.press('Meta+Shift+p');

// Fill multiple form fields at once
await page.fill([
  { ref: 'e2', value: 'Jane Doe' },
  { ref: 'e4', value: 'jane@acme.test' },
  { ref: 'e6', type: 'checkbox', value: true },
]);

fill() field types: 'text' (default) calls Playwright fill() with the string value. 'checkbox' and 'radio' call setChecked() with force: true (works on hidden inputs behind custom styling). Truthy values are true, 1, '1', 'true'. Type can be omitted and defaults to 'text'. Empty ref throws.

No-snapshot actions

These methods find and click elements without needing a snapshot first — useful when you know the text or role but don't want the snapshot+ref round-trip.

// Click by visible text or title attribute
await page.clickByText('Submit');
await page.clickByText('Save Changes', { exact: true });

// Click by ARIA role and accessible name
await page.clickByRole('button', 'Save');
await page.clickByRole('link', 'Settings');
await page.clickByRole('button', 'Create', { index: 1 }); // second match

// Click by CSS selector
await page.clickBySelector('#submit-btn');

// Click at page coordinates (for canvas elements, custom widgets)
await page.mouseClick(400, 300);

// Press and hold at coordinates (raw CDP events, bypasses automation detection)
await page.pressAndHold(400, 300, { holdMs: 5000, delay: 150 });

Highlight

await page.highlight('e1'); // Playwright built-in highlight

File Upload

Upload paths are confined to a sandboxed directory: $TMPDIR/browserclaw/uploads (e.g. /tmp/browserclaw/uploads on Linux). Files must exist inside this directory before uploading — paths outside it are rejected. Stage the file first, then reference it by path:

import { DEFAULT_UPLOAD_DIR } from 'browserclaw';
import { copyFile, mkdir } from 'node:fs/promises';
import { join } from 'node:path';

// Stage the file inside the sandboxed uploads directory
await mkdir(DEFAULT_UPLOAD_DIR, { recursive: true });
const staged = join(DEFAULT_UPLOAD_DIR, 'file.pdf');
await copyFile('/path/to/file.pdf', staged);

// Direct: set files on an <input type="file">
await page.uploadFile('e3', [staged]);

// Arm pattern: for non-input file pickers
const uploadDone = page.armFileUpload([staged]);
await page.click('e3'); // triggers the file chooser
await uploadDone;

Dialog Handling

Handle JavaScript dialogs (alert, confirm, prompt). Arm the handler before the action that triggers the dialog.

const dialogDone = page.armDialog({ accept: true });
await page.click('e5'); // triggers confirm()
await dialogDone;

// With prompt text
const promptDone = page.armDialog({ accept: true, promptText: 'my answer' });
await page.click('e6'); // triggers prompt()
await promptDone;

// Persistent handler: called for every dialog until cleared
await page.onDialog((event) => {
  console.log(`${event.type}: ${event.message}`);
  event.accept(); // or event.dismiss()
});
await page.onDialog(undefined); // clear the handler

By default, unexpected dialogs are auto-dismissed to prevent ProtocolError crashes.

Navigation & Waiting

await page.goto('https://demo.playwright.dev/todomvc');
await page.reload(); // reload the current page
await page.goBack(); // navigate back in history
await page.goForward(); // navigate forward in history
await page.waitFor({ loadState: 'networkidle' });
await page.waitFor({ text: 'Welcome' });
await page.waitFor({ textGone: 'Loading...' });
await page.waitFor({ url: '**/dashboard' });
await page.waitFor({ selector: '.loaded' }); // wait for CSS selector
await page.waitFor({ fn: '() => document.readyState === "complete"' }); // custom JS (string)
await page.waitFor({ fn: () => document.title === 'Done' }); // custom JS (function)
await page.waitFor({ fn: (name) => document.querySelector('button')?.textContent === name, arg: 'Save' }); // with arg
await page.waitFor({ timeMs: 1000 }); // sleep
await page.waitFor({ text: 'Ready', timeoutMs: 5000 }); // custom timeout

Capture

// Screenshots
const screenshot = await page.screenshot(); // viewport PNG → Buffer
const fullPage = await page.screenshot({ fullPage: true }); // full scrollable page
const element = await page.screenshot({ ref: 'e1' }); // specific element by ref
const bySelector = await page.screenshot({ element: '.hero' }); // by CSS selector
const jpeg = await page.screenshot({ type: 'jpeg' }); // JPEG format

// PDF
const pdf = await page.pdf(); // PDF export (headless only)

// Labeled screenshot — numbered badges on each ref for visual debugging
const { buffer, labels, skipped } = await page.screenshotWithLabels(['e1', 'e2', 'e3']);
// buffer: PNG with numbered overlays
// labels: [{ ref: 'e1', index: 1, box: { x, y, width, height } }, ...]
// skipped: refs that couldn't be found or had no bounding box

Both screenshot() and pdf() return a Buffer. Write to file with fs.writeFileSync('out.png', screenshot).

Trace Recording

Capture Playwright traces (screenshots, DOM snapshots, network) for debugging.

await page.traceStart({ screenshots: true, snapshots: true });
// ... perform actions ...
await page.traceStop('trace.zip');
// Open with: npx playwright show-trace trace.zip

Response Body

Intercept a network response and read its body.

const resp = await page.responseBody('/api/data');
console.log(resp.status, resp.body);
// { url, status, headers, body, truncated }

Options: timeoutMs (default 30 s), maxChars (truncate body).

Wait For Request

Wait for a network request matching a URL pattern and get full request + response details, including POST body.

const reqPromise = page.waitForRequest('/api/submit', { method: 'POST' });
await page.click('e5'); // submit a form
const req = await reqPromise;
console.log(req.method, req.postData); // 'POST', '{"name":"Jane"}'
console.log(req.status, req.ok); // 200, true
console.log(req.responseBody); // '{"id":123}'
// { url, method, postData?, status, ok, responseBody?, truncated? }

Options: method (filter by HTTP method), timeoutMs (default 30 s), maxChars (truncate response body).

Activity Monitoring

Console messages, errors, and network requests are buffered automatically.

const logs = await page.consoleLogs(); // all messages
const errors = await page.consoleLogs({ level: 'error' }); // errors only
const recent = await page.consoleLogs({ clear: true }); // read and clear buffer
const pageErrors = await page.pageErrors(); // uncaught exceptions
const requests = await page.networkRequests({ filter: '/api' }); // filter by URL
const fresh = await page.networkRequests({ clear: true }); // read and clear buffer

Storage

// Cookies
const cookies = await page.cookies();
await page.setCookie({ name: 'token', value: 'abc', url: 'https://demo.playwright.dev' });
await page.clearCookies();

// localStorage / sessionStorage
const values = await page.storageGet('local');
const token = await page.storageGet('local', 'authToken');
await page.storageSet('local', 'key', 'value');
await page.storageClear('session');

Downloads

// Click a download link and save the file
const result = await page.download('e7', '/tmp/report.pdf');
console.log(result.suggestedFilename); // 'report.pdf'
// Returns: { url, suggestedFilename, path }

// Arm pattern: wait for next download (call before triggering)
const dlPromise = page.waitForDownload({ path: '/tmp/file.pdf' });
await page.click('e8'); // triggers download
const dl = await dlPromise;

Emulation

// Device emulation (viewport + user agent)
await page.setDevice('iPhone 13');

// Color scheme
await page.emulateMedia({ colorScheme: 'dark' });

// Geolocation
await page.setGeolocation({ latitude: 48.8566, longitude: 2.3522 }); // Paris
await page.setGeolocation({ clear: true }); // reset

// Locale & timezone
await page.setLocale('fr-FR');
await page.setTimezone('Europe/Paris');

// Network
await page.setOffline(true);
await page.setExtraHeaders({ 'X-Custom': 'value' });
await page.setHttpCredentials({ username: 'admin', password: 'secret' });
await page.setHttpCredentials({ clear: true }); // remove

Evaluate

Run JavaScript directly in the browser page context.

const title = await page.evaluate('() => document.title');
const text = await page.evaluate('(el) => el.textContent', { ref: 'e1' });
const count = await page.evaluate('() => document.querySelectorAll("img").length');

`evaluateInAllFrames(fn)`

Run JavaScript in ALL frames on the page, including cross-origin iframes. Playwright bypasses the same-origin policy via CDP, making this essential for interacting with embedded payment forms (Stripe, etc.).

const results = await page.evaluateInAllFrames(`() => {
  const el = document.querySelector('input[name="cardnumber"]');
  return el ? 'found' : null;
}`);
// Returns: [{ frameUrl: '...', frameName: '...', result: 'found' }, ...]

Viewport

await page.resize(1280, 720);

Examples

See the examples/ directory for runnable demos:

basic.ts — Navigate, snapshot, click a ref
form-fill.ts — Fill a multi-field form using refs
ai-agent.ts — AI agent loop pattern with Claude/GPT

Run from the source tree:

npx tsx examples/basic.ts

Requirements

Node.js >= 18
Chromium-based browser installed (Chrome, Brave, Edge, or Chromium)
playwright-core >= 1.50 (installed automatically as a dependency)

No need to install Playwright browsers — browserclaw uses your system's existing Chrome installation via CDP.

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch (git checkout -b my-feature)
Make your changes
Run npm run typecheck && npm run build to verify
Submit a pull request

Acknowledgments

browserclaw was born from the browser automation module in OpenClaw, built by Peter Steinberger and an amazing community of contributors. The snapshot + ref system, CDP connection management, and Playwright integration originate from that project.

License

MIT

Back to Playwright