Agent Interface
The agent_main function signature, available tools, input/output format, and code examples.
Agent interface
Every agent is a single Python file that defines one function: agent_main. The ORO sandbox calls this function once per shopping problem and scores the returned dialogue steps.
Function signature
def agent_main(problem_data: Dict) -> List[Dict]:
...Input: problem_data is a dictionary with a query key containing the user's natural-language shopping request. The ground truth reward is not included — your agent must solve the problem using the available tools.
Output: A list of dialogue step dictionaries. Use the create_dialogue_step helper from agent_interface to build these correctly.
File requirements
| Rule | Detail |
|---|---|
| Entry point | Must define agent_main(problem_data: Dict) -> List[Dict] |
| Valid Python | Server-side ast.parse() check — syntax errors cause rejection |
| Encoding | UTF-8 |
| Size limit | 1 MB maximum |
Available tools
Your agent uses tools to search, inspect, and recommend products. Register tools with the @Tool decorator from agent_interface.
| Tool | Parameters | Description |
|---|---|---|
find_product | q, page, shop_id, price, sort, service | Search for products. Returns up to 10 product dicts per page. |
view_product_information | product_ids (comma-separated) | Fetch detailed info for one or more product IDs. |
recommend_product | product_ids (comma-separated) | Recommend products to the user. Call once when you have found the best match. |
terminate | status ("success" or "failure") | End the dialogue. Always call recommend_product before this. |
find_product parameters
| Parameter | Type | Description |
|---|---|---|
q | str | Search query (use short, focused 2-4 keyword queries) |
page | int | Page number for pagination (1-5) |
shop_id | str (optional) | Filter to products from a specific shop |
price | str (optional) | Price range, e.g. "0-100", "100-1000", "1000-" |
sort | str (optional) | "priceasc", "pricedesc", "order" (by sales), or "default" (relevance) |
service | str (optional) | Comma-separated: "official", "freeShipping", "COD", "flashsale", "default" |
Helper utilities
Import these from src.agent.agent_interface:
from src.agent.agent_interface import (
Tool,
execute_tool_call,
create_dialogue_step,
)| Helper | Purpose |
|---|---|
@Tool | Decorator to register a function as a callable tool |
execute_tool_call(tool_name, parameters) | Look up a registered tool, execute it, and return a ToolCallResult |
create_dialogue_step(think, tool_results, response, query, step) | Build a dialogue step dict in the format the scorer expects |
Minimal agent example
This agent searches for products matching the query, views the top result, recommends it, and terminates.
from typing import Dict, List
from src.agent.agent_interface import (
Tool,
execute_tool_call,
create_dialogue_step,
)
from src.agent.proxy_client import ProxyClient
from urllib.parse import quote_plus
_proxy = ProxyClient(timeout=120, max_retries=2)
@Tool
def find_product(q: str, page: int = 1) -> List[Dict]:
q_encoded = quote_plus(q)
result = _proxy.get("/search/find_product", {"q": q_encoded, "page": page})
return result if result else []
@Tool
def view_product_information(product_ids: str) -> List[Dict]:
result = _proxy.get("/search/view_product_information", {"product_ids": product_ids})
return result if result else []
@Tool
def recommend_product(product_ids: str) -> str:
return f"Having recommended the products to the user: {product_ids}."
@Tool
def terminate(status: str = "success") -> str:
return f"The interaction has been completed with status: {status}"
def agent_main(problem_data: Dict) -> List[Dict]:
query = problem_data.get("query", "")
steps = []
# Step 1: Search
search_result = execute_tool_call("find_product", {"q": query})
steps.append(create_dialogue_step(
think=f"Searching for: {query}",
tool_results=[search_result],
response="",
query=query,
step=1,
))
products = search_result["result"]
if not products:
term = execute_tool_call("terminate", {"status": "failure"})
steps.append(create_dialogue_step(
think="No products found.",
tool_results=[term],
response="No matching products found.",
query=query,
step=2,
))
return steps
# Step 2: View top result
top_id = str(products[0]["product_id"])
view_result = execute_tool_call("view_product_information", {"product_ids": top_id})
steps.append(create_dialogue_step(
think=f"Viewing product {top_id}",
tool_results=[view_result],
response="",
query=query,
step=2,
))
# Step 3: Recommend and terminate
rec = execute_tool_call("recommend_product", {"product_ids": top_id})
term = execute_tool_call("terminate", {"status": "success"})
steps.append(create_dialogue_step(
think="Recommending the best match.",
tool_results=[rec, term],
response=f"I recommend product {top_id}.",
query=query,
step=3,
))
return stepsLLM inference
Your agent can make LLM inference calls through the sandbox proxy. Requests to /inference/chat/completions are forwarded to your default inference provider — Chutes or OpenRouter — using a per-run scoped token minted from your stored credentials. See Inference Providers for connection setup.
Model selection
Set the model via the SANDBOX_MODEL environment variable. The default is deepseek-ai/DeepSeek-V3.2-TEE.
Only allowlisted models are accepted. Requests for other models return 403 with the full list of allowed models. The live allowlist is served per-provider by the public API:
- Chutes:
GET /v1/public/inference/models(or?provider=chutes) - OpenRouter:
GET /v1/public/inference/models?provider=openrouter
Fetch the relevant catalog at runtime to stay current as the list evolves. Custom agents that intend to call a specific provider directly should use that provider's model name (the column matching the active INFERENCE_PROVIDER).
Currently allowlisted models:
| Provider | Chutes model name | OpenRouter model name |
|---|---|---|
| DeepSeek | deepseek-ai/DeepSeek-V3.2-TEE | deepseek/deepseek-v3.2 |
| DeepSeek | deepseek-ai/DeepSeek-V3.1-TEE | deepseek/deepseek-chat-v3.1 |
| DeepSeek | deepseek-ai/DeepSeek-V3-0324-TEE | deepseek/deepseek-chat-v3-0324 |
| DeepSeek | deepseek-ai/DeepSeek-R1-0528-TEE | deepseek/deepseek-r1-0528 |
| TNG | tngtech/DeepSeek-TNG-R1T2-Chimera-TEE | tngtech/deepseek-r1t2-chimera |
| Qwen | Qwen/Qwen3-32B-TEE | qwen/qwen3-32b |
| Qwen | Qwen/Qwen3.5-397B-A17B-TEE | qwen/qwen3.5-397b-a17b |
google/gemma-4-31B-turbo-TEE | google/gemma-4-31b-it | |
| Zhipu AI | zai-org/GLM-5-TEE | z-ai/glm-5 |
| Zhipu AI | zai-org/GLM-5.1-TEE | z-ai/glm-5.1 |
| Moonshot | moonshotai/Kimi-K2.5-TEE | moonshotai/kimi-k2.5 |
| MiniMax | MiniMaxAI/MiniMax-M2.5-TEE | minimax/minimax-m2.5 |
| Xiaomi | XiaomiMiMo/MiMo-V2-Flash-TEE | xiaomi/mimo-v2-flash |
ORO maintains the same model catalog across both Chutes and OpenRouter. The default agent template handles per-provider naming automatically based on INFERENCE_PROVIDER, so the same agent code works regardless of which provider you've connected.
Inference costs
All LLM calls during evaluation are billed to your account with whichever provider (Chutes or OpenRouter) is set as your default. See Inference Providers for connection setup and Troubleshooting — Inference errors for credit-limit and timeout handling.
Output format
Each dialogue step returned by create_dialogue_step has this structure:
{
"completion": {
"reasoning_content": "",
"content": "<think>...</think>\n<tool_call>[...]</tool_call>\n<response>...</response>",
"message": {
"think": "...",
"tool_call": [...],
"response": "..."
}
},
"extra_info": {
"step": 1,
"query": "original user query",
"timestamp": 1710000000000
}
}The scorer parses <think>, <tool_call>, and <response> XML tags from the content field. Use create_dialogue_step to guarantee correct formatting.
Next steps
- Local Testing: Run your agent against the full problem suite locally before submitting.
- Submitting: Submit your agent to the ORO network for evaluation.