open source · v0.7.5 · MIT licensed

Build agent workflows you can actually debug.

Binex is an open-source visual runtime for AI agent pipelines. Drag nodes. Pick any model. Run locally. Replay any step. Your data, your graph, your keys — never leaves your machine.

License
MIT
Models
40+ via LiteLLM
Telemetry
0 bytes sent
Install size
~12 MB
run/0421-a7f2
editor
trace
research_pipeline.yaml LIVE · 03:24 elapsed
done running pending
7 nodes · 9 edges
B/X
§ 01 — positioning

A runtime for agent workflows that respects your laptop.

// what is binex
problem
01

Agent orchestrators today are black boxes running on somebody else's computer. You send your prompts, your data, your keys — and hope the trace UI is good enough when it breaks.

Binex flips it. The whole runtime fits in a pip install. Every prompt, every tool call, every byte of context — visible, local, diffable, replayable.

§ 02 — live demo

Watch a multi-agent pipeline run, node by node.

// simulation · no API calls
research_pipeline · run 0421-a7f2 ● LIVE $0.00 00:00

progress

nodes0/8
tokens in0
tokens out0
artifacts0

event log

§ 03 — node types

Seven kinds of work. One graph.

// agent URI scheme
llm agent01
LLM Agent
llm://anthropic/claude-sonnet

Any model via LiteLLM. System prompt, tools, temperature, max tokens — all visible, all editable, all serialized to YAML.

local script02
Local Script
local://scripts/parse.py

Plain Python. Gets upstream artifacts on stdin, writes whatever you return. Deterministic steps between agents.

human input03
Human Input
human://input

Pause the graph. A form appears in the UI or terminal. The run waits — your answer is the node's output.

human approve04
Human Approve
human://approve

Gate destructive steps behind a review. See the upstream output, approve or reject, add a note. It's logged with the run.

human output05
Human Output
human://output

Terminal nodes for your pipeline's final deliverable. Shown with rich rendering in the run detail view.

agent-to-agent06
A2A Agent
a2a://team/researcher

Call another agent over Google's A2A protocol — your graph becomes one node in someone else's bigger graph.

cli agent07
▸_
CAO Agent
cao://claude-code/session

Shell out to a terminal coding agent (Claude Code, Codex, Cursor). Sessions persist across runs.

pattern08
Pattern Node
pattern://map-reduce

Higher-order composition: fan-out, map-reduce, retry-with-critique. One node that expands into many at runtime.

§ 04 — features

Everything you need to treat agent pipelines like software.

// six things that matter
01 / 06

Visual ↔ YAML sync

Edit the graph by dragging, or by typing. Both sides stay in sync — commit the YAML, diff it in PRs like any other config.

nodes:
  - id: researcher
    agent: llm://gpt-4.1-mini
  - id: writer
02 / 06

Replay any node

Swap the model on one node and re-run just that step. The rest of the graph keeps its artifacts. Iterate in seconds, not minutes.

↺ node: researcher
  model: gpt-4o → gpt-4.1-mini
  status: ✓ 0.8s  artifacts: kept
  tokens: 1 247  ↓ 73% cost
03 / 06

Full trace timeline

Every prompt, every tool call, every byte of context — pinned to a Gantt chart with anomaly detection baked in.

researcher  ██████░░░░ 0.8s
writer      ░░░████░░░ 1.2s
reviewer    ░░░░░░░███ 0.4s
total: 2.4s  anomalies: 0
04 / 06

Diff two runs

Side-by-side comparison with filtering: changed outputs, failed nodes, cost delta. Bisect divergence down to a single node.

- "prefer bullet points"
+ "prefer prose paragraphs"
05 / 06

Budgets & scheduling

Set hard dollar caps per run, per day, per project. Cron schedules. Alerts. Pause auto-approvals if a budget trips.

daily cap:  $2.00
used today: $0.43 ▓░░░░░░░░░
run cap:    $0.25
schedule:   0 9 * * 1-5
06 / 06

MCP + custom tools

10 built-in tools, plus any MCP server (stdio or HTTP), plus your own Python. Agents see exactly what you let them see.

[mcp] filesystem  [mcp] github  [py] parse_pdf.py
§ 05 — providers

Any model. Swap in one line.

// 40+ via LiteLLM
llm://anthropicclaude-sonnet-4.5200K ctx · tools
llm://openaigpt-5frontier
llm://openaigpt-4.1-minifast · cheap
llm://googlegemini-2.5-pro1M ctx
llm://xaigrok-3reasoning
llm://deepseekdeepseek-r1open weights
llm://metallama-4-scoutopen weights
llm://mistralmistral-largeeu hosted
llm://coherecommand-arag-tuned
llm://groqllama-3.3-70bfastest tps
llm://togetherqwen-2.5-72bcheap
llm://fireworksdeepseek-v3serverless
llm://ollamaqwen2.5:14blocal · free
llm://ollamallama3.2:8blocal · free
llm://lm-studioany gguflocal
llm://openrouter+300 modelsone key
llm://bedrockaws hostedenterprise
llm://azurems hostedenterprise
see all+ 22 morevia litellm
§ 06 — how it works

Install, draw, run. Three moves.

// < 3 min to first run
step 01

Install and launch the UI.

One pip install, one command. The UI opens in your browser at localhost:7860. No login. No account. No config.

$
pip install binex
binex ui
→ http://localhost:7860
step 02

Drag nodes, connect edges.

Pick an LLM, write a prompt, drop a human-approve in the middle if you want. Or paste existing YAML — same graph, either way.

llm human output
step 03

Run it. Debug every step.

Watch the DAG light up as each node completes. Click any node to see inputs, outputs, prompts, tokens, cost. Replay if needed.

researcher · 2.1s · 3,208 tok
synthesizer · 1.7s · 2,104 tok
writer · running · 0.8s...
reviewer · pending
§ 07 — under the hood

The YAML is the graph. The graph is the YAML.

// no magic
research.yaml32 lines · MIT
# research.yaml — a 4-node pipeline
version: "1"
name: research_pipeline

nodes:
  - id: researcher
    agent: llm://anthropic/claude-sonnet
    prompt: prompts/researcher.md
    tools: [web_search, fetch_url]

  - id: synthesizer
    agent: llm://openai/gpt-4.1-mini
    inputs: [researcher]

  - id: review
    agent: human://approve
    inputs: [synthesizer]

  - id: writer
    agent: llm://anthropic/claude-sonnet
    inputs: [review]
    on_failure: retry(3)

budget:
  max_usd: 0.25
  max_tokens: 100_000
run.shwhat you'd actually type
# run it, watch it, replay it
$ binex run research.yaml \
    --input "agentic ui patterns, 2026"

# ↓ live trace
[12:04:18] researcher   → llm://anthropic   runs
[12:04:23] researcher   ✓ 2,104 tok · $0.011
[12:04:23] synthesizer  → llm://openai      runs
[12:04:26] synthesizer  ✓ 1,512 tok · $0.004
[12:04:26] review       ⏸ waiting for approval
[12:04:54] review       ✓ approved by alex
[12:04:54] writer       → llm://anthropic   runs
[12:05:08] writer       ✓ 3,620 tok · $0.048

# oops — re-run just the writer with a different model
$ binex replay run/0421-a7f2 writer \
    --model llm://openai/gpt-5
§ 08 — superpower

Break one node. Fix it. Never re-run the rest.

// replay
replay mode

Swap the model on one node. The graph keeps its artifacts.

Your first draft cost you $0.18 and six minutes. You want to try Claude instead of GPT on just the writer. Normal orchestrators re-run the entire pipeline. Binex doesn't. Swap the model, hit replay, get a new output in the time it takes to run a single LLM call. All other artifacts stay pinned.

researcher CACHED synthesize CACHED review CACHED writer RE-RUN ▼ swap only this node's model before llm://openai/gpt-4o after llm://anthropic/sonnet cost delta: -$0.13 · time: 14s (was 6m 12s)
§ 09 — vs cloud platforms

Why you'd pick a pip install over a SaaS login.

// honest comparison
Binex typical cloud orchestrators
Where does your prompt data go? stays on your machine proxied via vendor
Telemetry sent on run 0 bytes required
Replay a single node built-in re-run the flow
Works offline / air-gapped with ollama, yes no
Diff two runs side-by-side built-in log download
License MIT — forever proprietary
Pricing for 10k runs / mo $0 + model costs seat fee + per-run
Host it yourself it is self-hosted enterprise tier

↑ Not a dig at cloud platforms — they solve real problems, especially for teams that need managed hosting and RBAC. Binex is for the crowd that'd rather just run it locally.

§ 10 — community

Built in the open, with the people using it.

// github.com/alexli18/binex
54
stars on GitHub — early days, but growing fast. Every star genuinely helps.
contributors
2
releases
v0.7.5
open issues
8

First time with agent workflows?

No gatekeeping. There's a zero-config `binex hello` that runs a toy 2-node pipeline, plus an examples folder you can copy-paste from.

AL
MK
JP
RC
DF
SN
VV
BO
TQ
YH
GG
+
★ star on github ↘ discussions → read the docs