Project Awesome project awesome

General-Purpose Machine Learning > PyCaret

An open-source, low-code machine learning library in Python that automates machine learning workflows.

Package 9.8k stars GitHub
PyCaret

PyCaret — open-source ML platform

The engine, the control plane, and the UI — all in one self-hosted box.

CI Python License

Vision · Architecture · Roadmap · Spec · Quickstart · Agent guide


⚠ 4.0 is work in progress — you're looking at main (the 4.0 line)

PyCaret 4.0 is a ground-up architectural revamp. It now lives on main. The 3.x release line is frozen on PyPI as pycaret 3.4.0 (no further commits).

Track progress in docs/revamp/STATUS.md and docs/revamp/ROADMAP.md.


What you get

PyCaret is two layers of one product:

The engine (packages/engine/) — pip install pycaret. Config-driven, stateless, sklearn-composable. Use it in a notebook:

from pycaret.tasks import ClassificationExperiment
from pycaret.datasets import get_data

df = get_data("juice")
exp = ClassificationExperiment(target="Purchase", session_id=42).fit(df)
best = exp.compare_models().best
tuned = exp.tune_model(best).pipeline
exp.save_model(tuned, "baseline")

PyCaret Control Plane (services/api/ + apps/web/ + infra/) — the full self-hosted web platform that wraps the engine. Workspaces, projects, datasets, experiments, runs, artifacts, deployments, monitoring, LLM-assisted experiment design. Run it on a laptop, a Docker host, or Kubernetes.

Three deployment modes (current + roadmapped):

Mode For Status
Notebook (pip install pycaret) Data scientist workflow 4.0.0a1 on PyPI
Local dev (uv + npm) Building against the Control Plane ✅ shipped
Single-server Docker compose Small-team self-hosted ✅ shipped
Kubernetes + Helm + Terraform Enterprise cloud 🔴 V2 (stubs scaffolded)
Electron desktop Analyst, no Docker 🔴 V2 (stub scaffolded)

Repo layout

pycaret/                  ← monorepo
├── packages/
│   └── engine/           → `pycaret` on PyPI
├── services/
│   ├── api/              → `pycaret-server` on PyPI (FastAPI)
│   ├── worker/           (V2) background job runner
│   └── deployment-runtime/ (V2) standalone serving
├── apps/
│   ├── web/              React + Vite (Control Plane UI)
│   └── desktop/          (V2) Electron
├── infra/
│   ├── docker/           Dockerfile.api, Dockerfile.ui, compose
│   ├── helm/             (V2) Kubernetes chart
│   └── terraform/        (V2) AWS / GCP / Azure modules
└── docs/revamp/          VISION, SPEC, ARCHITECTURE, ROADMAP, STATUS, DECISIONS

See docs/revamp/ARCHITECTURE.md for the full system architecture.

Try it locally — 3 minutes

Just the engine, in a notebook:

pip install pycaret
# or with every optional extra:
pip install "pycaret[full]"

Supported: Python 3.11 / 3.12 / 3.13.

The full Control Plane, from source:

git clone https://github.com/pycaret/pycaret.git
cd pycaret

# Backend (terminal 1)
uv python install 3.13
uv sync --all-packages --all-extras
uv run --package pycaret-server pycaret-server serve --reload

# Frontend (terminal 2)
cd apps/web
npm install
npm run dev
# → http://localhost:3000/setup

Or with Docker (full stack, one command):

docker compose -f infra/docker/docker-compose.yml up --build
# → http://localhost:3000

See docs/revamp/PLATFORM_QUICKSTART.md for the full quickstart.

Engine quickstart

from pycaret.datasets import get_data
from pycaret.tasks import ClassificationExperiment
from pycaret import save_model, load_model

df = get_data("juice")
exp = ClassificationExperiment(target="Purchase", session_id=42).fit(df)

# Compare models — returns a typed CompareResult
result = exp.compare_models()
best = result.best
print(result.leaderboard)

# Tune — returns a TuneResult
tuned = exp.tune_model(best).pipeline

# Predict — returns a PredictResult
preds = exp.predict_model(tuned).predictions

# Save + load
save_model(tuned, "artifacts/best")
restored = load_model("artifacts/best")

Same shape for the other task types:

from pycaret.tasks import (
    RegressionExperiment,
    ClusteringExperiment,
    AnomalyExperiment,
    TimeSeriesExperiment,
)

Introspection — for UIs and LLM agents

from pycaret.api import (
    list_models, describe_model, list_metrics, describe_setup_params,
)

list_models("classification")           # -> list[ModelCard]
describe_model("classification", "lr")  # -> ModelCard
list_metrics("classification")          # -> list[MetricCard]

# UI-form schema — JSON-serializable, renders directly as a dynamic form
schema = describe_setup_params("classification")

The Control Plane UI renders its entire experiment-setup form from describe_setup_params. Zero UI code hard-codes a parameter name.

Event stream

from pycaret.logging import MemoryLogger

log = MemoryLogger()
log.subscribe(lambda event: print(event.kind.value, event.message))

exp = ClassificationExperiment(target="y", logger=log).fit(df)
exp.compare_models()   # emits experiment.started → model.compare.finished → ...

The Control Plane backend subclasses BaseLogger with DBEventLogger — every engine event becomes a DB row and streams live to any connected WebSocket clients.

What's deliberately not here

  • Module-level functional API (setup, compare_models) — use OOP Experiment classes.
  • External experiment trackers: mlflow, comet-ml, wandb, dagshub — the Control Plane owns this story now.
  • Distributed backends: fugue, dask, ray (V3 opt-in).
  • Visualization: yellowbrick, mljar-scikit-plot, schemdraw — Plotly-only rewrite in progress.
  • In-engine deployment helpers: create_api, create_app, create_docker, dashboard, convert_model, deploy_model — the Control Plane owns serving + deployment.
  • Drift / fairness in the engine: check_drift, check_fairness — moved to the monitoring layer.

See docs/revamp/KILL_LIST.md for the exhaustive list.

Who this is for

  • Data scientists who want AutoML in a notebook without vendor lock-in.
  • ML engineers who want an open-source control plane they can self-host — train, deploy, monitor, improve.
  • Small teams (≤20 people) who need the whole loop without Databricks licenses.
  • Enterprises who need SSO + audit logs + multi-cloud deployment in the same repo they started prototyping with.
  • LLM agents that introspect and drive ML experiments — every model, metric, and parameter is a serializable dataclass.

See docs/revamp/VISION.md for the product statement.

Licensing

PyCaret 4.0 ships under the Functional Source License (FSL-1.1-MIT) — the same license Sentry, Convex, and Keep use. The short version:

  • Free for individual use, internal corporate use, non-commercial education / research, and consulting work delivered on top of PyCaret.
  • Not free to use as the basis of a competing AutoML product or hosted service.
  • Auto-converts to MIT two years after each release. The 4.0.0 release becomes plain MIT in 2028, the next minor in two years from its release date, and so on.

See LICENSE for the full text.

Per-package detail:

  • packages/engine/ (the pycaret library) and apps/site/ are FSL-1.1-MIT.
  • services/api/ and apps/web/ (the Control Plane backend + frontend) are dual-licensed FSL-1.1-MIT OR BUSL-1.1. Self-host freely; the BSL grant kicks in for multi-tenant hosted commercialisation and auto-converts after three years.

Rationale and the chain of decisions: docs/revamp/DECISIONS.md.

The 3.x line on PyPI (pycaret <= 3.4.0) remains MIT — license changes only apply to 4.0+.

Contributing

PyCaret is under active revamp and is Claude-Code-first: anyone can clone the repo, run Claude Code in their own checkout (using their own Claude credentials), pick a maintainer-Approved issue, and let the agent open a PR.

gh repo clone pycaret/pycaret && cd pycaret
claude
> /work-on-approved-issue

Compute is community-funded — there's no Claude API key in this repo's secrets and no CI bot that auto-fixes issues. The Claude Code setup lives in CLAUDE.md (entry point), .claude/ (slash commands + sub-agents + permissions), and per-directory CLAUDE.md files. Cross-vendor instructions for non-Claude agents live in AGENTS.md.

Traditional contributions — clone, edit, PR — are also first-class. Read CONTRIBUTING.md. Bug reports welcome; large feature PRs should discuss in an issue first.

Back to Machine Learning