Binex — visual orchestrator for AI agent workflows

local-firstzero telemetryMIT license40+ LLM providersLangChain compatible CrewAI adapterAutoGen adapterMCP server supportOpenTelemetry tracingreplay any node YAML ↔ visual synccron schedulingbudget trackingA2A protocol local-firstzero telemetryMIT license40+ LLM providersLangChain compatible CrewAI adapterAutoGen adapterMCP server supportOpenTelemetry tracingreplay any node YAML ↔ visual synccron schedulingbudget trackingA2A protocol

§ 01 — positioning

A runtime for agent workflows that respects your laptop.

// what is binex

problem

Agent orchestrators today are black boxes running on somebody else's computer. You send your prompts, your data, your keys — and hope the trace UI is good enough when it breaks.

Binex flips it. The whole runtime fits in a pip install. Every prompt, every tool call, every byte of context — visible, local, diffable, replayable.

§ 02 — live demo

Watch a multi-agent pipeline run, node by node.

// simulation · no API calls

research_pipeline · run 0421-a7f2 ● LIVE $0.00 00:00

§ 03 — node types

Seven kinds of work. One graph.

// agent URI scheme

llm agent01

◆

LLM Agent

llm://anthropic/claude-sonnet

Any model via LiteLLM. System prompt, tools, temperature, max tokens — all visible, all editable, all serialized to YAML.

local script02

▣

Local Script

local://scripts/parse.py

Plain Python. Gets upstream artifacts on stdin, writes whatever you return. Deterministic steps between agents.

human input03

✎

Human Input

human://input

Pause the graph. A form appears in the UI or terminal. The run waits — your answer is the node's output.

human approve04

✓

Human Approve

human://approve

Gate destructive steps behind a review. See the upstream output, approve or reject, add a note. It's logged with the run.

human output05

⎘

Human Output

human://output

Terminal nodes for your pipeline's final deliverable. Shown with rich rendering in the run detail view.

agent-to-agent06

⇄

A2A Agent

a2a://team/researcher

Call another agent over Google's A2A protocol — your graph becomes one node in someone else's bigger graph.

cli agent07

▸_

CAO Agent

cao://claude-code/session

Shell out to a terminal coding agent (Claude Code, Codex, Cursor). Sessions persist across runs.

pattern08

※

Pattern Node

pattern://map-reduce

Higher-order composition: fan-out, map-reduce, retry-with-critique. One node that expands into many at runtime.

§ 04 — features

Everything you need to treat agent pipelines like software.

// six things that matter

01 / 06

Visual ↔ YAML sync

Edit the graph by dragging, or by typing. Both sides stay in sync — commit the YAML, diff it in PRs like any other config.

nodes:
  - id: researcher
    agent: llm://gpt-4.1-mini
  - id: writer

02 / 06

Replay any node

Swap the model on one node and re-run just that step. The rest of the graph keeps its artifacts. Iterate in seconds, not minutes.

↺ node: researcher
  model: gpt-4o → gpt-4.1-mini
  status: ✓ 0.8s artifacts: kept
  tokens: 1 247 ↓ 73% cost

03 / 06

Full trace timeline

Every prompt, every tool call, every byte of context — pinned to a Gantt chart with anomaly detection baked in.

researcher ██████░░░░ 0.8s
writer ░░░████░░░ 1.2s
reviewer ░░░░░░░███ 0.4s
total: 2.4s anomalies: 0

04 / 06

Diff two runs

Side-by-side comparison with filtering: changed outputs, failed nodes, cost delta. Bisect divergence down to a single node.

- "prefer bullet points"
+ "prefer prose paragraphs"

05 / 06

Budgets & scheduling

Set hard dollar caps per run, per day, per project. Cron schedules. Alerts. Pause auto-approvals if a budget trips.

daily cap: $2.00
used today: $0.43 ▓░░░░░░░░░
run cap: $0.25
schedule: 0 9 * * 1-5

06 / 06

MCP + custom tools

10 built-in tools, plus any MCP server (stdio or HTTP), plus your own Python. Agents see exactly what you let them see.

[mcp] filesystem [mcp] github [py] parse_pdf.py

§ 05 — providers

Any model. Swap in one line.

// 40+ via LiteLLM

llm://anthropicclaude-sonnet-4.5200K ctx · tools

llm://openaigpt-5frontier

llm://openaigpt-4.1-minifast · cheap

llm://googlegemini-2.5-pro1M ctx

llm://xaigrok-3reasoning

llm://deepseekdeepseek-r1open weights

llm://metallama-4-scoutopen weights

llm://mistralmistral-largeeu hosted

llm://coherecommand-arag-tuned

llm://groqllama-3.3-70bfastest tps

llm://togetherqwen-2.5-72bcheap

llm://fireworksdeepseek-v3serverless

llm://ollamaqwen2.5:14blocal · free

llm://ollamallama3.2:8blocal · free

llm://lm-studioany gguflocal

llm://openrouter+300 modelsone key

llm://bedrockaws hostedenterprise

llm://azurems hostedenterprise

see all+ 22 morevia litellm

§ 06 — how it works

Install, draw, run. Three moves.

// < 3 min to first run

step 01

Install and launch the UI.

One pip install, one command. The UI opens in your browser at localhost:7860. No login. No account. No config.

pip install binex

binex ui

→ http://localhost:7860

step 02

Drag nodes, connect edges.

Pick an LLM, write a prompt, drop a human-approve in the middle if you want. Or paste existing YAML — same graph, either way.

step 03

Run it. Debug every step.

Watch the DAG light up as each node completes. Click any node to see inputs, outputs, prompts, tokens, cost. Replay if needed.

✓ researcher · 2.1s · 3,208 tok

✓ synthesizer · 1.7s · 2,104 tok

▸ writer · running · 0.8s...

○ reviewer · pending

§ 07 — under the hood

The YAML is the graph. The graph is the YAML.

// no magic

research.yaml32 lines · MIT

# research.yaml — a 4-node pipeline
version: "1"
name: research_pipeline

nodes:
  - id: researcher
    agent: llm://anthropic/claude-sonnet
    prompt: prompts/researcher.md
    tools: [web_search, fetch_url]

  - id: synthesizer
    agent: llm://openai/gpt-4.1-mini
    inputs: [researcher]

  - id: review
    agent: human://approve
    inputs: [synthesizer]

  - id: writer
    agent: llm://anthropic/claude-sonnet
    inputs: [review]
    on_failure: retry(3)

budget:
  max_usd: 0.25
  max_tokens: 100_000

run.shwhat you'd actually type

# run it, watch it, replay it
$ binex run research.yaml \
    --input "agentic ui patterns, 2026"

# ↓ live trace
[12:04:18] researcher   → llm://anthropic   runs
[12:04:23] researcher   ✓ 2,104 tok · $0.011
[12:04:23] synthesizer  → llm://openai      runs
[12:04:26] synthesizer  ✓ 1,512 tok · $0.004
[12:04:26] review       ⏸ waiting for approval
[12:04:54] review       ✓ approved by alex
[12:04:54] writer       → llm://anthropic   runs
[12:05:08] writer       ✓ 3,620 tok · $0.048

# oops — re-run just the writer with a different model
$ binex replay run/0421-a7f2 writer \
    --model llm://openai/gpt-5

§ 08 — superpower

Break one node. Fix it. Never re-run the rest.

// replay

replay mode

Swap the model on one node. The graph keeps its artifacts.

Your first draft cost you $0.18 and six minutes. You want to try Claude instead of GPT on just the writer. Normal orchestrators re-run the entire pipeline. Binex doesn't. Swap the model, hit replay, get a new output in the time it takes to run a single LLM call. All other artifacts stay pinned.

§ 09 — vs cloud platforms

Why you'd pick a pip install over a SaaS login.

// honest comparison

	Binex	typical cloud orchestrators
Where does your prompt data go?	stays on your machine	proxied via vendor
Telemetry sent on run	0 bytes	required
Replay a single node	built-in	re-run the flow
Works offline / air-gapped	with ollama, yes	no
Diff two runs side-by-side	built-in	log download
License	MIT — forever	proprietary
Pricing for 10k runs / mo	$0 + model costs	seat fee + per-run
Host it yourself	it is self-hosted	enterprise tier

↑ Not a dig at cloud platforms — they solve real problems, especially for teams that need managed hosting and RBAC. Binex is for the crowd that'd rather just run it locally.

§ 10 — community

Built in the open, with the people using it.

// github.com/alexli18/binex

stars on GitHub — early days, but growing fast. Every star genuinely helps.

contributors

releases

v0.7.5

open issues

First time with agent workflows?

No gatekeeping. There's a zero-config `binex hello` that runs a toy 2-node pipeline, plus an examples folder you can copy-paste from.

★ star on github ↘ discussions → read the docs

Build agent workflows you can actually debug.