Testing
Running Tests
Run the full suite:
python -m pytest tests/
The suite contains ~870 tests across ~75 unit test files and 1 integration test file.
Run a specific test file:
python -m pytest tests/unit/test_qa_phase4_core.py
Run tests matching a keyword:
python -m pytest tests/ -k "test_hello"
Run with verbose output:
python -m pytest tests/ -v
Test Organization
tests/
├── conftest.py # Shared fixtures (sample workflows)
├── unit/ # Unit tests (~75 files, ~860 tests)
│ ├── test_models_*.py # Pydantic model validation
│ ├── test_cli_*.py # CLI command tests
│ ├── test_dag.py # DAG construction and traversal
│ ├── test_scheduler.py # Scheduler logic (dependency resolution)
│ ├── test_dispatcher.py # Dispatcher and adapter routing
│ ├── test_qa_*.py # QA regression tests (see below)
│ └── ...
└── integration/
└── test_orchestrator.py # End-to-end orchestrator tests
QA Test Files
QA tests follow a structured plan and are organized by phase:
| File | Focus | Tests |
|---|---|---|
test_qa_models.py |
Pydantic models, validation | ~20 |
test_qa_stores.py |
Sqlite and filesystem stores | ~20 |
test_qa_dag_scheduler.py |
DAG and scheduler | ~15 |
test_qa_adapters_runtime.py |
Adapters, dispatcher | ~15 |
test_qa_cli_workflow.py |
CLI + workflow loading | ~15 |
test_qa_trace_registry.py |
Trace, registry | ~15 |
test_qa_replay.py |
Replay command | ~10 |
test_qa_phase2.py |
Agents, settings | 22 |
test_qa_phase3_cli.py |
CLI DX commands | 22 |
test_qa_phase4_core.py |
Runtime, stores, adapters | 27 |
test_qa_phase5a_security.py |
Security, E2E | 22 |
test_qa_phase5b_remaining.py |
Registry, trace, workflow, models | 21 |
Async Test Configuration
All async tests are auto-detected. The pyproject.toml sets:
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
This means you do not need the @pytest.mark.asyncio decorator. Simply define your test as async def and pytest-asyncio handles the rest:
async def test_orchestrator_runs_two_nodes():
orch = Orchestrator(
artifact_store=InMemoryArtifactStore(),
execution_store=InMemoryExecutionStore(),
)
# ... register adapters, run workflow
summary = await orch.run_workflow(spec)
assert summary.completed_nodes == 2
Shared Fixtures
Defined in tests/conftest.py:
sample_workflow_dict()— Minimal 2-node workflow (producer -> consumer) with local echo agentssample_research_workflow_dict()— 5-node research pipeline (planner -> 2 researchers -> validator -> summarizer)
Usage:
def test_workflow_parsing(sample_workflow_dict):
spec = WorkflowSpec(**sample_workflow_dict)
assert len(spec.nodes) == 2
assert "producer" in spec.nodes
Mocking Patterns
CLI Store Patching
CLI commands use a _get_stores() helper that returns real sqlite + filesystem stores by default. Always patch this in tests to avoid hitting disk:
from click.testing import CliRunner
from unittest.mock import patch
from binex.cli.hello import hello_cmd
from binex.stores.backends.memory import InMemoryExecutionStore, InMemoryArtifactStore
def test_hello_command():
stores = InMemoryExecutionStore(), InMemoryArtifactStore()
with patch("binex.cli.hello._get_stores", return_value=stores):
runner = CliRunner()
result = runner.invoke(hello_cmd, [])
assert result.exit_code == 0
assert "Hello from Binex!" in result.output
The patch target follows the pattern binex.cli.<module>._get_stores, where <module> matches the command file (e.g., run, debug, hello, trace, replay).
In-Memory Stores for Unit Tests
For non-CLI tests, use the in-memory store implementations directly:
from binex.stores.backends.memory import InMemoryExecutionStore, InMemoryArtifactStore
async def test_store_roundtrip():
store = InMemoryExecutionStore()
await store.record(execution_record)
result = await store.get_run(run_id)
assert result is not None
LiteLLM Mocking
Mock litellm.acompletion when testing LLM-backed nodes:
from unittest.mock import AsyncMock, patch
async def test_llm_adapter():
mock_response = AsyncMock()
mock_response.choices = [AsyncMock(message=AsyncMock(content="result"))]
with patch("litellm.acompletion", new_callable=AsyncMock, return_value=mock_response):
result = await adapter.execute(task, inputs, trace_id)
assert result[0].content == "result"
Custom Test Adapters
For orchestrator tests, create simple adapter classes instead of mocking:
class EchoAdapter:
"""Returns an artifact containing the node_id as content."""
def __init__(self, content: str | None = None, *, fail: bool = False):
self._content = content
self._fail = fail
self.call_count = 0
async def execute(self, task, input_artifacts, trace_id):
self.call_count += 1
if self._fail:
raise RuntimeError(f"Node {task.node_id} failed")
content = self._content or f"result_from_{task.node_id}"
return [
Artifact(
id=f"art_{task.run_id}_{task.node_id}",
run_id=task.run_id,
type="result",
content=content,
lineage=Lineage(
produced_by=task.node_id,
derived_from=[a.id for a in input_artifacts],
),
)
]
async def cancel(self, task_id: str) -> None:
pass
async def health(self):
return AgentHealth.ALIVE
This pattern is used extensively in test_qa_phase4_core.py for testing orchestrator flows, retry logic, and DAG execution order.
Writing YAML Workflow Files in Tests
Use tmp_path and textwrap.dedent to create temporary workflow files:
import textwrap
from pathlib import Path
def _write_yaml(tmp_path: Path, content: str) -> Path:
wf = tmp_path / "wf.yaml"
wf.write_text(textwrap.dedent(content))
return wf
def test_run_command(tmp_path):
wf = _write_yaml(tmp_path, """\
name: test
nodes:
node1:
agent: "local://echo"
system_prompt: do_stuff
outputs: [result]
""")
# ... invoke CLI with str(wf)
Linting
Run ruff to check for style and import issues:
ruff check src/
The project uses these ruff rules: E (pycodestyle errors), F (pyflakes), I (isort), N (naming), W (warnings), UP (pyupgrade). Line length is 99 characters.
Code Coverage
To run tests with coverage:
python -m pytest tests/ --cov=src/binex --cov-report=term-missing
The current test suite achieves approximately 96% code coverage.