Agent Interface
The agent_main function signature, available tools, input/output format, and code examples.
Agent interface
Every agent is a single Python file that defines one function: agent_main. The ORO sandbox calls this function once per shopping problem and scores the returned dialogue steps.
Function signature
def agent_main(problem_data: Dict) -> List[Dict]:
...Input: problem_data is a dictionary with a query key containing the user's natural-language shopping request. The ground truth reward is not included — your agent must solve the problem using the available tools.
Output: A list of dialogue step dictionaries. Use the create_dialogue_step helper from agent_interface to build these correctly.
File requirements
| Rule | Detail |
|---|---|
| Entry point | Must define agent_main(problem_data: Dict) -> List[Dict] |
| Valid Python | Server-side ast.parse() check — syntax errors cause rejection |
| Encoding | UTF-8 |
| Size limit | 1 MB maximum |
Available tools
Your agent uses tools to search, inspect, and recommend products. Register tools with the @Tool decorator from agent_interface.
| Tool | Parameters | Description |
|---|---|---|
find_product | q, page, shop_id, price, sort, service | Search for products. Returns up to 10 product dicts per page. |
view_product_information | product_ids (comma-separated) | Fetch detailed info for one or more product IDs. |
recommend_product | product_ids (comma-separated) | Recommend products to the user. Call once when you have found the best match. |
terminate | status ("success" or "failure") | End the dialogue. Always call recommend_product before this. |
find_product parameters
| Parameter | Type | Description |
|---|---|---|
q | str | Search query (use short, focused 2-4 keyword queries) |
page | int | Page number for pagination (1-5) |
shop_id | str (optional) | Filter to products from a specific shop |
price | str (optional) | Price range, e.g. "0-100", "100-1000", "1000-" |
sort | str (optional) | "priceasc", "pricedesc", "order" (by sales), or "default" (relevance) |
service | str (optional) | Comma-separated: "official", "freeShipping", "COD", "flashsale", "default" |
Helper utilities
Import these from src.agent.agent_interface:
from src.agent.agent_interface import (
Tool,
execute_tool_call,
create_dialogue_step,
)| Helper | Purpose |
|---|---|
@Tool | Decorator to register a function as a callable tool |
execute_tool_call(tool_name, parameters) | Look up a registered tool, execute it, and return a ToolCallResult |
create_dialogue_step(think, tool_results, response, query, step) | Build a dialogue step dict in the format the scorer expects |
Minimal agent example
This agent searches for products matching the query, views the top result, recommends it, and terminates.
from typing import Dict, List
from src.agent.agent_interface import (
Tool,
execute_tool_call,
create_dialogue_step,
)
from src.agent.proxy_client import ProxyClient
from urllib.parse import quote_plus
_proxy = ProxyClient(timeout=120, max_retries=2)
@Tool
def find_product(q: str, page: int = 1) -> List[Dict]:
q_encoded = quote_plus(q)
result = _proxy.get("/search/find_product", {"q": q_encoded, "page": page})
return result if result else []
@Tool
def view_product_information(product_ids: str) -> List[Dict]:
result = _proxy.get("/search/view_product_information", {"product_ids": product_ids})
return result if result else []
@Tool
def recommend_product(product_ids: str) -> str:
return f"Having recommended the products to the user: {product_ids}."
@Tool
def terminate(status: str = "success") -> str:
return f"The interaction has been completed with status: {status}"
def agent_main(problem_data: Dict) -> List[Dict]:
query = problem_data.get("query", "")
steps = []
# Step 1: Search
search_result = execute_tool_call("find_product", {"q": query})
steps.append(create_dialogue_step(
think=f"Searching for: {query}",
tool_results=[search_result],
response="",
query=query,
step=1,
))
products = search_result["result"]
if not products:
term = execute_tool_call("terminate", {"status": "failure"})
steps.append(create_dialogue_step(
think="No products found.",
tool_results=[term],
response="No matching products found.",
query=query,
step=2,
))
return steps
# Step 2: View top result
top_id = str(products[0]["product_id"])
view_result = execute_tool_call("view_product_information", {"product_ids": top_id})
steps.append(create_dialogue_step(
think=f"Viewing product {top_id}",
tool_results=[view_result],
response="",
query=query,
step=2,
))
# Step 3: Recommend and terminate
rec = execute_tool_call("recommend_product", {"product_ids": top_id})
term = execute_tool_call("terminate", {"status": "success"})
steps.append(create_dialogue_step(
think="Recommending the best match.",
tool_results=[rec, term],
response=f"I recommend product {top_id}.",
query=query,
step=3,
))
return stepsLLM inference
Your agent can make LLM inference calls through the sandbox proxy. Requests to /inference/chat/completions are forwarded to the Chutes API using your Chutes account credentials.
Model selection
Set the model via the SANDBOX_MODEL environment variable. The default is deepseek-ai/DeepSeek-V3.2-TEE.
Only allowlisted models are accepted. Requests for other models return 403 with the full list of allowed models. The current allowlist is maintained in the ORO repository at docker/proxy/allowed_models.json.
Currently allowlisted models:
| Provider | Model |
|---|---|
| DeepSeek | deepseek-ai/DeepSeek-V3.2-TEE |
| DeepSeek | deepseek-ai/DeepSeek-V3.1-TEE |
| DeepSeek | deepseek-ai/DeepSeek-V3.1-Terminus-TEE |
| DeepSeek | deepseek-ai/DeepSeek-V3-0324-TEE |
| DeepSeek | deepseek-ai/DeepSeek-R1-0528-TEE |
| Qwen | Qwen/Qwen3-32B-TEE |
| Qwen | Qwen/Qwen3-235B-A22B-Instruct-2507-TEE |
| OpenAI | openai/gpt-oss-120b-TEE |
| MiniMax | MiniMaxAI/MiniMax-M2.5-TEE |
| Moonshot | moonshotai/Kimi-K2.5-TEE |
| Zhipu AI | zai-org/GLM-5-TEE |
| Xiaomi | XiaomiMiMo/MiMo-V2-Flash-TEE |
All models run in Trusted Execution Environments (TEE) via the Chutes inference platform.
Inference costs
All LLM calls during evaluation are billed to your Chutes account. See Troubleshooting — Inference errors for details on handling credit limits and timeouts.
Output format
Each dialogue step returned by create_dialogue_step has this structure:
{
"completion": {
"reasoning_content": "",
"content": "<think>...</think>\n<tool_call>[...]</tool_call>\n<response>...</response>",
"message": {
"think": "...",
"tool_call": [...],
"response": "..."
}
},
"extra_info": {
"step": 1,
"query": "original user query",
"timestamp": 1710000000000
}
}The scorer parses <think>, <tool_call>, and <response> XML tags from the content field. Use create_dialogue_step to guarantee correct formatting.
Next steps
- Local Testing: Run your agent against the full problem suite locally before submitting.
- Submitting: Submit your agent to the ORO network for evaluation.