Binex is an open-source visual runtime for AI agent pipelines. Drag nodes. Pick any model. Run locally. Replay any step. Your data, your graph, your keys — never leaves your machine.
Agent orchestrators today are black boxes running on somebody else's computer. You send your prompts, your data, your keys — and hope the trace UI is good enough when it breaks.
Binex flips it. The whole runtime fits in a pip install. Every prompt, every tool call, every byte of context — visible, local, diffable, replayable.
Any model via LiteLLM. System prompt, tools, temperature, max tokens — all visible, all editable, all serialized to YAML.
Plain Python. Gets upstream artifacts on stdin, writes whatever you return. Deterministic steps between agents.
Pause the graph. A form appears in the UI or terminal. The run waits — your answer is the node's output.
Gate destructive steps behind a review. See the upstream output, approve or reject, add a note. It's logged with the run.
Terminal nodes for your pipeline's final deliverable. Shown with rich rendering in the run detail view.
Call another agent over Google's A2A protocol — your graph becomes one node in someone else's bigger graph.
Shell out to a terminal coding agent (Claude Code, Codex, Cursor). Sessions persist across runs.
Higher-order composition: fan-out, map-reduce, retry-with-critique. One node that expands into many at runtime.
Edit the graph by dragging, or by typing. Both sides stay in sync — commit the YAML, diff it in PRs like any other config.
Swap the model on one node and re-run just that step. The rest of the graph keeps its artifacts. Iterate in seconds, not minutes.
Every prompt, every tool call, every byte of context — pinned to a Gantt chart with anomaly detection baked in.
Side-by-side comparison with filtering: changed outputs, failed nodes, cost delta. Bisect divergence down to a single node.
Set hard dollar caps per run, per day, per project. Cron schedules. Alerts. Pause auto-approvals if a budget trips.
10 built-in tools, plus any MCP server (stdio or HTTP), plus your own Python. Agents see exactly what you let them see.
One pip install, one command. The UI opens in your browser at localhost:7860. No login. No account. No config.
Pick an LLM, write a prompt, drop a human-approve in the middle if you want. Or paste existing YAML — same graph, either way.
Watch the DAG light up as each node completes. Click any node to see inputs, outputs, prompts, tokens, cost. Replay if needed.
# research.yaml — a 4-node pipeline version: "1" name: research_pipeline nodes: - id: researcher agent: llm://anthropic/claude-sonnet prompt: prompts/researcher.md tools: [web_search, fetch_url] - id: synthesizer agent: llm://openai/gpt-4.1-mini inputs: [researcher] - id: review agent: human://approve inputs: [synthesizer] - id: writer agent: llm://anthropic/claude-sonnet inputs: [review] on_failure: retry(3) budget: max_usd: 0.25 max_tokens: 100_000
# run it, watch it, replay it $ binex run research.yaml \ --input "agentic ui patterns, 2026" # ↓ live trace [12:04:18] researcher → llm://anthropic runs [12:04:23] researcher ✓ 2,104 tok · $0.011 [12:04:23] synthesizer → llm://openai runs [12:04:26] synthesizer ✓ 1,512 tok · $0.004 [12:04:26] review ⏸ waiting for approval [12:04:54] review ✓ approved by alex [12:04:54] writer → llm://anthropic runs [12:05:08] writer ✓ 3,620 tok · $0.048 # oops — re-run just the writer with a different model $ binex replay run/0421-a7f2 writer \ --model llm://openai/gpt-5
Your first draft cost you $0.18 and six minutes. You want to try Claude instead of GPT on just the writer. Normal orchestrators re-run the entire pipeline. Binex doesn't. Swap the model, hit replay, get a new output in the time it takes to run a single LLM call. All other artifacts stay pinned.
| Binex | typical cloud orchestrators | |
|---|---|---|
| Where does your prompt data go? | stays on your machine | proxied via vendor |
| Telemetry sent on run | 0 bytes | required |
| Replay a single node | built-in | re-run the flow |
| Works offline / air-gapped | with ollama, yes | no |
| Diff two runs side-by-side | built-in | log download |
| License | MIT — forever | proprietary |
| Pricing for 10k runs / mo | $0 + model costs | seat fee + per-run |
| Host it yourself | it is self-hosted | enterprise tier |
↑ Not a dig at cloud platforms — they solve real problems, especially for teams that need managed hosting and RBAC. Binex is for the crowd that'd rather just run it locally.
No gatekeeping. There's a zero-config `binex hello` that runs a toy 2-node pipeline, plus an examples folder you can copy-paste from.