Skip to content

Multi-Provider LLM Support

Binex supports mixing different LLM providers in a single workflow via LiteLLM model naming. You can combine local models (Ollama), cloud APIs (OpenAI, Anthropic, Gemini), and remote A2A agents in one pipeline.

Example: Multi-Provider Research Pipeline

The included examples/multi-provider-demo.yaml demonstrates a research pipeline that uses Ollama for planning/summarization and OpenRouter for parallel research:

name: multi-provider-research
description: "Research pipeline: Ollama plans & summarizes, Gemini researches"

nodes:
  user_input:
    agent: "human://input"
    system_prompt: "What would you like to research?"
    outputs: [result]

  planner:
    agent: "llm://ollama/gemma3:4b"
    system_prompt: >
      You are a research planner. Given a topic, create a structured
      research plan with 3 specific subtopics to investigate.
      Output a numbered list of research tasks. Be concise.
    inputs:
      topic: "${user_input.result}"
    outputs: [result]
    depends_on: [user_input]

  researcher1:
    agent: "llm://openrouter/z-ai/glm-4.5-air:free"
    system_prompt: >
      You are a thorough researcher. Investigate the first subtopic
      from the research plan. Provide findings with specific facts.
      Keep response under 200 words.
    inputs:
      plan: "${planner.result}"
    outputs: [result]
    depends_on: [planner]

  researcher2:
    agent: "llm://openrouter/stepfun/step-3.5-flash:free"
    system_prompt: >
      You are a thorough researcher. Investigate the second subtopic
      from the research plan. Provide findings with specific facts.
      Keep response under 200 words.
    inputs:
      plan: "${planner.result}"
    outputs: [result]
    depends_on: [planner]

  summarizer:
    agent: "llm://ollama/gemma3:4b"
    system_prompt: >
      You are a summarizer. Combine the research findings into a clear,
      well-structured final summary. Include key findings and conclusions.
      Keep response under 300 words.
    inputs:
      research1: "${researcher1.result}"
      research2: "${researcher2.result}"
    outputs: [result]
    depends_on: [researcher1, researcher2]

The DAG topology is: user_input -> planner -> [researcher1, researcher2] -> summarizer. The two researchers run in parallel since they share the same dependency.

Usage Modes

1. Direct — API keys in environment

Set provider API keys as environment variables and use standard LiteLLM model names in your workflow:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...

binex run examples/multi-provider-demo.yaml

LiteLLM routes each model name to the correct provider automatically. You can also put keys in a .env file in your project root — Binex loads it via python-dotenv at startup.

2. Ollama — fully local, no API keys

For local-only workflows, Ollama requires no API keys. Just make sure Ollama is running:

# Start Ollama (if not already running)
ollama serve

# Pull the model you need
ollama pull gemma3:4b

# Run the workflow
binex run my-workflow.yaml

Use the llm://ollama/<model> prefix in your workflow nodes:

nodes:
  writer:
    agent: "llm://ollama/gemma3:4b"
    system_prompt: "Write a short poem"
    outputs: [result]

3. Proxy — centralized routing via docker-compose

Run docker-compose with the included litellm_config.yaml for a single proxy endpoint:

# Set API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...

cd docker
docker-compose up -d

The proxy exposes all configured models on http://localhost:4000. Route traffic through it using config.api_base on your workflow nodes or by setting LITELLM_API_BASE in your environment.

4. Per-node config overrides

Use the optional config block on any node to set temperature, max_tokens, api_base, or api_key:

nodes:
  planner:
    agent: "llm://gpt-4o"
    system_prompt: "Plan the research"
    inputs:
      query: "${user.query}"
    outputs: [plan]
    config:
      temperature: 0.3
      max_tokens: 4096

  researcher:
    agent: "llm://gemini/gemini-2.0-flash"
    system_prompt: "Research the topic"
    inputs:
      questions: "${planner.plan}"
    outputs: [findings]
    depends_on: [planner]
    config:
      api_base: "http://localhost:4000"  # route through proxy

Config values are forwarded to litellm.acompletion() and only included when not None.

Supported Providers

Binex includes a built-in registry of 9 providers:

Provider Agent prefix Default model API key env var
Ollama llm://ollama/ ollama/llama3.2 None (local)
OpenAI llm:// gpt-4o OPENAI_API_KEY
Anthropic llm:// claude-sonnet-4-20250514 ANTHROPIC_API_KEY
Gemini llm://gemini/ gemini/gemini-2.0-flash GEMINI_API_KEY
Groq llm://groq/ groq/llama3-70b-8192 GROQ_API_KEY
Mistral llm://mistral/ mistral/mistral-large-latest MISTRAL_API_KEY
DeepSeek llm://deepseek/ deepseek/deepseek-chat DEEPSEEK_API_KEY
Together llm://together_ai/ together_ai/meta-llama/Llama-3-70b TOGETHER_API_KEY
OpenRouter llm://openrouter/ openrouter/google/gemini-2.5-flash OPENROUTER_API_KEY

Any model name supported by LiteLLM works with the llm:// prefix. See the LiteLLM docs for the full list.

A2A Agents

Use the a2a:// prefix to connect to remote A2A-compatible agent servers. The remote agent must expose POST /execute and GET /health endpoints:

nodes:
  analyzer:
    agent: "a2a://http://localhost:8001"
    system_prompt: "Analyze data"
    inputs:
      data: "${user.data}"
    outputs: [analysis]

The A2A adapter sends {task_id, skill, trace_id, artifacts} to the remote endpoint and expects {artifacts} in the response.

Troubleshooting

Ollama not running

Error: Connection refused — http://localhost:11434

Make sure Ollama is running (ollama serve) and that you have pulled the model referenced in your workflow (ollama pull <model>). Ollama runs on port 11434 by default.

API key missing or invalid

Error: AuthenticationError — Invalid API key

Check that the appropriate environment variable is set. You can use binex doctor to verify your environment:

binex doctor

Alternatively, put your keys in a .env file in your project root:

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Model not found

Error: Model 'xyz' not found

Verify the model name matches LiteLLM's expected format. Use the provider table above as a reference. For Ollama, ensure the model is pulled locally. For cloud providers, check that the model name is valid for your account/tier.

Timeout or slow responses

If a node times out, you can increase the deadline in the workflow defaults or per-node:

defaults:
  deadline_ms: 120000  # 2 minutes

nodes:
  slow_node:
    agent: "llm://gpt-4o"
    system_prompt: "Detailed analysis"
    outputs: [result]
    deadline_ms: 300000  # 5 minutes for this node