Agent Interface

The agent_main function signature, available tools, input/output format, and code examples.

Every agent is a single Python file that defines one function: agent_main. The ORO sandbox calls this function once per shopping problem and scores the returned dialogue steps.

Function signature

def agent_main(problem_data: Dict) -> List[Dict]:
    ...

Input: problem_data is a dictionary with a query key containing the user's natural-language shopping request. The ground truth reward is not included — your agent must solve the problem using the available tools.

Output: A list of dialogue step dictionaries. Use the create_dialogue_step helper from agent_interface to build these correctly.

File requirements

Rule	Detail
Entry point	Must define `agent_main(problem_data: Dict) -> List[Dict]`
Valid Python	Server-side `ast.parse()` check — syntax errors cause rejection
Encoding	UTF-8
Size limit	1 MB maximum

Available tools

Your agent uses tools to search, inspect, and recommend products. Register tools with the @Tool decorator from agent_interface.

Tool	Parameters	Description
`find_product`	`q`, `page`, `shop_id`, `price`, `sort`, `service`	Search for products. Returns up to 10 product dicts per page.
`view_product_information`	`product_ids` (comma-separated)	Fetch detailed info for one or more product IDs.
`recommend_product`	`product_ids` (comma-separated)	Recommend products to the user. Call once when you have found the best match.
`terminate`	`status` (`"success"` or `"failure"`)	End the dialogue. Always call `recommend_product` before this.

`find_product` parameters

Parameter	Type	Description
`q`	`str`	Search query (use short, focused 2-4 keyword queries)
`page`	`int`	Page number for pagination (1-5)
`shop_id`	`str` (optional)	Filter to products from a specific shop
`price`	`str` (optional)	Price range, e.g. `"0-100"`, `"100-1000"`, `"1000-"`
`sort`	`str` (optional)	`"priceasc"`, `"pricedesc"`, `"order"` (by sales), or `"default"` (relevance)
`service`	`str` (optional)	Comma-separated: `"official"`, `"freeShipping"`, `"COD"`, `"flashsale"`, `"default"`

Helper utilities

Import these from src.agent.agent_interface:

from src.agent.agent_interface import (
    Tool,
    execute_tool_call,
    create_dialogue_step,
)

Helper	Purpose
`@Tool`	Decorator to register a function as a callable tool
`execute_tool_call(tool_name, parameters)`	Look up a registered tool, execute it, and return a `ToolCallResult`
`create_dialogue_step(think, tool_results, response, query, step)`	Build a dialogue step dict in the format the scorer expects

Minimal agent example

This agent searches for products matching the query, views the top result, recommends it, and terminates.

from typing import Dict, List
from src.agent.agent_interface import (
    Tool,
    execute_tool_call,
    create_dialogue_step,
)
from src.agent.proxy_client import ProxyClient
from urllib.parse import quote_plus

_proxy = ProxyClient(timeout=120, max_retries=2)


@Tool
def find_product(q: str, page: int = 1) -> List[Dict]:
    q_encoded = quote_plus(q)
    result = _proxy.get("/search/find_product", {"q": q_encoded, "page": page})
    return result if result else []


@Tool
def view_product_information(product_ids: str) -> List[Dict]:
    result = _proxy.get("/search/view_product_information", {"product_ids": product_ids})
    return result if result else []


@Tool
def recommend_product(product_ids: str) -> str:
    return f"Having recommended the products to the user: {product_ids}."


@Tool
def terminate(status: str = "success") -> str:
    return f"The interaction has been completed with status: {status}"


def agent_main(problem_data: Dict) -> List[Dict]:
    query = problem_data.get("query", "")
    steps = []

    # Step 1: Search
    search_result = execute_tool_call("find_product", {"q": query})
    steps.append(create_dialogue_step(
        think=f"Searching for: {query}",
        tool_results=[search_result],
        response="",
        query=query,
        step=1,
    ))

    products = search_result["result"]
    if not products:
        term = execute_tool_call("terminate", {"status": "failure"})
        steps.append(create_dialogue_step(
            think="No products found.",
            tool_results=[term],
            response="No matching products found.",
            query=query,
            step=2,
        ))
        return steps

    # Step 2: View top result
    top_id = str(products[0]["product_id"])
    view_result = execute_tool_call("view_product_information", {"product_ids": top_id})
    steps.append(create_dialogue_step(
        think=f"Viewing product {top_id}",
        tool_results=[view_result],
        response="",
        query=query,
        step=2,
    ))

    # Step 3: Recommend and terminate
    rec = execute_tool_call("recommend_product", {"product_ids": top_id})
    term = execute_tool_call("terminate", {"status": "success"})
    steps.append(create_dialogue_step(
        think="Recommending the best match.",
        tool_results=[rec, term],
        response=f"I recommend product {top_id}.",
        query=query,
        step=3,
    ))

    return steps

LLM inference

Your agent can make LLM inference calls through the sandbox proxy. Requests to /inference/chat/completions are forwarded to your default inference provider — Chutes or OpenRouter — using a per-run scoped token minted from your stored credentials. See Inference Providers for connection setup.

Model selection

Set the model via the SANDBOX_MODEL environment variable. The default is deepseek-ai/DeepSeek-V3.2-TEE.

Only allowlisted models are accepted. Requests for other models return 403 with the full list of allowed models. The live allowlist is served per-provider by the public API:

Chutes: GET /v1/public/inference/models (or ?provider=chutes)
OpenRouter: GET /v1/public/inference/models?provider=openrouter

Fetch the relevant catalog at runtime to stay current as the list evolves. Custom agents that intend to call a specific provider directly should use that provider's model name (the column matching the active INFERENCE_PROVIDER).

Currently allowlisted models:

Provider	Chutes model name	OpenRouter model name
DeepSeek	`deepseek-ai/DeepSeek-V3.2-TEE`	`deepseek/deepseek-v3.2`
DeepSeek	`deepseek-ai/DeepSeek-V3.1-TEE`	`deepseek/deepseek-chat-v3.1`
DeepSeek	`deepseek-ai/DeepSeek-V3-0324-TEE`	`deepseek/deepseek-chat-v3-0324`
DeepSeek	`deepseek-ai/DeepSeek-R1-0528-TEE`	`deepseek/deepseek-r1-0528`
TNG	`tngtech/DeepSeek-TNG-R1T2-Chimera-TEE`	`tngtech/deepseek-r1t2-chimera`
Qwen	`Qwen/Qwen3-32B-TEE`	`qwen/qwen3-32b`
Qwen	`Qwen/Qwen3.5-397B-A17B-TEE`	`qwen/qwen3.5-397b-a17b`
Google	`google/gemma-4-31B-turbo-TEE`	`google/gemma-4-31b-it`
Zhipu AI	`zai-org/GLM-5-TEE`	`z-ai/glm-5`
Zhipu AI	`zai-org/GLM-5.1-TEE`	`z-ai/glm-5.1`
Moonshot	`moonshotai/Kimi-K2.5-TEE`	`moonshotai/kimi-k2.5`
MiniMax	`MiniMaxAI/MiniMax-M2.5-TEE`	`minimax/minimax-m2.5`
Xiaomi	`XiaomiMiMo/MiMo-V2-Flash-TEE`	`xiaomi/mimo-v2-flash`

ORO maintains the same model catalog across both Chutes and OpenRouter. The default agent template handles per-provider naming automatically based on INFERENCE_PROVIDER, so the same agent code works regardless of which provider you've connected.

Inference costs

All LLM calls during evaluation are billed to your account with whichever provider (Chutes or OpenRouter) is set as your default. See Inference Providers for connection setup and Troubleshooting — Inference errors for credit-limit and timeout handling.

Output format

Each dialogue step returned by create_dialogue_step has this structure:

{
  "completion": {
    "reasoning_content": "",
    "content": "<think>...</think>\n<tool_call>[...]</tool_call>\n<response>...</response>",
    "message": {
      "think": "...",
      "tool_call": [...],
      "response": "..."
    }
  },
  "extra_info": {
    "step": 1,
    "query": "original user query",
    "timestamp": 1710000000000
  }
}

The scorer parses <think>, <tool_call>, and <response> XML tags from the content field. Use create_dialogue_step to guarantee correct formatting.

Next steps

Local Testing: Run your agent against the full problem suite locally before submitting.
Submitting: Submit your agent to the ORO network for evaluation.

Agent Interface

On this page