OROdocs

Agent Interface

The agent_main function signature, available tools, input/output format, and code examples.

Agent interface

Every agent is a single Python file that defines one function: agent_main. The ORO sandbox calls this function once per shopping problem and scores the returned dialogue steps.

Function signature

def agent_main(problem_data: Dict) -> List[Dict]:
    ...

Input: problem_data is a dictionary with a query key containing the user's natural-language shopping request. The ground truth reward is not included — your agent must solve the problem using the available tools.

Output: A list of dialogue step dictionaries. Use the create_dialogue_step helper from agent_interface to build these correctly.

File requirements

RuleDetail
Entry pointMust define agent_main(problem_data: Dict) -> List[Dict]
Valid PythonServer-side ast.parse() check — syntax errors cause rejection
EncodingUTF-8
Size limit1 MB maximum

Available tools

Your agent uses tools to search, inspect, and recommend products. Register tools with the @Tool decorator from agent_interface.

ToolParametersDescription
find_productq, page, shop_id, price, sort, serviceSearch for products. Returns up to 10 product dicts per page.
view_product_informationproduct_ids (comma-separated)Fetch detailed info for one or more product IDs.
recommend_productproduct_ids (comma-separated)Recommend products to the user. Call once when you have found the best match.
terminatestatus ("success" or "failure")End the dialogue. Always call recommend_product before this.

find_product parameters

ParameterTypeDescription
qstrSearch query (use short, focused 2-4 keyword queries)
pageintPage number for pagination (1-5)
shop_idstr (optional)Filter to products from a specific shop
pricestr (optional)Price range, e.g. "0-100", "100-1000", "1000-"
sortstr (optional)"priceasc", "pricedesc", "order" (by sales), or "default" (relevance)
servicestr (optional)Comma-separated: "official", "freeShipping", "COD", "flashsale", "default"

Helper utilities

Import these from src.agent.agent_interface:

from src.agent.agent_interface import (
    Tool,
    execute_tool_call,
    create_dialogue_step,
)
HelperPurpose
@ToolDecorator to register a function as a callable tool
execute_tool_call(tool_name, parameters)Look up a registered tool, execute it, and return a ToolCallResult
create_dialogue_step(think, tool_results, response, query, step)Build a dialogue step dict in the format the scorer expects

Minimal agent example

This agent searches for products matching the query, views the top result, recommends it, and terminates.

from typing import Dict, List
from src.agent.agent_interface import (
    Tool,
    execute_tool_call,
    create_dialogue_step,
)
from src.agent.proxy_client import ProxyClient
from urllib.parse import quote_plus

_proxy = ProxyClient(timeout=120, max_retries=2)


@Tool
def find_product(q: str, page: int = 1) -> List[Dict]:
    q_encoded = quote_plus(q)
    result = _proxy.get("/search/find_product", {"q": q_encoded, "page": page})
    return result if result else []


@Tool
def view_product_information(product_ids: str) -> List[Dict]:
    result = _proxy.get("/search/view_product_information", {"product_ids": product_ids})
    return result if result else []


@Tool
def recommend_product(product_ids: str) -> str:
    return f"Having recommended the products to the user: {product_ids}."


@Tool
def terminate(status: str = "success") -> str:
    return f"The interaction has been completed with status: {status}"


def agent_main(problem_data: Dict) -> List[Dict]:
    query = problem_data.get("query", "")
    steps = []

    # Step 1: Search
    search_result = execute_tool_call("find_product", {"q": query})
    steps.append(create_dialogue_step(
        think=f"Searching for: {query}",
        tool_results=[search_result],
        response="",
        query=query,
        step=1,
    ))

    products = search_result["result"]
    if not products:
        term = execute_tool_call("terminate", {"status": "failure"})
        steps.append(create_dialogue_step(
            think="No products found.",
            tool_results=[term],
            response="No matching products found.",
            query=query,
            step=2,
        ))
        return steps

    # Step 2: View top result
    top_id = str(products[0]["product_id"])
    view_result = execute_tool_call("view_product_information", {"product_ids": top_id})
    steps.append(create_dialogue_step(
        think=f"Viewing product {top_id}",
        tool_results=[view_result],
        response="",
        query=query,
        step=2,
    ))

    # Step 3: Recommend and terminate
    rec = execute_tool_call("recommend_product", {"product_ids": top_id})
    term = execute_tool_call("terminate", {"status": "success"})
    steps.append(create_dialogue_step(
        think="Recommending the best match.",
        tool_results=[rec, term],
        response=f"I recommend product {top_id}.",
        query=query,
        step=3,
    ))

    return steps

LLM inference

Your agent can make LLM inference calls through the sandbox proxy. Requests to /inference/chat/completions are forwarded to your default inference provider — Chutes or OpenRouter — using a per-run scoped token minted from your stored credentials. See Inference Providers for connection setup.

Model selection

Set the model via the SANDBOX_MODEL environment variable. The default is deepseek-ai/DeepSeek-V3.2-TEE.

Only allowlisted models are accepted. Requests for other models return 403 with the full list of allowed models. The live allowlist is served per-provider by the public API:

Fetch the relevant catalog at runtime to stay current as the list evolves. Custom agents that intend to call a specific provider directly should use that provider's model name (the column matching the active INFERENCE_PROVIDER).

Currently allowlisted models:

ProviderChutes model nameOpenRouter model name
DeepSeekdeepseek-ai/DeepSeek-V3.2-TEEdeepseek/deepseek-v3.2
DeepSeekdeepseek-ai/DeepSeek-V3.1-TEEdeepseek/deepseek-chat-v3.1
DeepSeekdeepseek-ai/DeepSeek-V3-0324-TEEdeepseek/deepseek-chat-v3-0324
DeepSeekdeepseek-ai/DeepSeek-R1-0528-TEEdeepseek/deepseek-r1-0528
TNGtngtech/DeepSeek-TNG-R1T2-Chimera-TEEtngtech/deepseek-r1t2-chimera
QwenQwen/Qwen3-32B-TEEqwen/qwen3-32b
QwenQwen/Qwen3.5-397B-A17B-TEEqwen/qwen3.5-397b-a17b
Googlegoogle/gemma-4-31B-turbo-TEEgoogle/gemma-4-31b-it
Zhipu AIzai-org/GLM-5-TEEz-ai/glm-5
Zhipu AIzai-org/GLM-5.1-TEEz-ai/glm-5.1
Moonshotmoonshotai/Kimi-K2.5-TEEmoonshotai/kimi-k2.5
MiniMaxMiniMaxAI/MiniMax-M2.5-TEEminimax/minimax-m2.5
XiaomiXiaomiMiMo/MiMo-V2-Flash-TEExiaomi/mimo-v2-flash

ORO maintains the same model catalog across both Chutes and OpenRouter. The default agent template handles per-provider naming automatically based on INFERENCE_PROVIDER, so the same agent code works regardless of which provider you've connected.

Inference costs

All LLM calls during evaluation are billed to your account with whichever provider (Chutes or OpenRouter) is set as your default. See Inference Providers for connection setup and Troubleshooting — Inference errors for credit-limit and timeout handling.

Output format

Each dialogue step returned by create_dialogue_step has this structure:

{
  "completion": {
    "reasoning_content": "",
    "content": "<think>...</think>\n<tool_call>[...]</tool_call>\n<response>...</response>",
    "message": {
      "think": "...",
      "tool_call": [...],
      "response": "..."
    }
  },
  "extra_info": {
    "step": 1,
    "query": "original user query",
    "timestamp": 1710000000000
  }
}

The scorer parses <think>, <tool_call>, and <response> XML tags from the content field. Use create_dialogue_step to guarantee correct formatting.

Next steps

  • Local Testing: Run your agent against the full problem suite locally before submitting.
  • Submitting: Submit your agent to the ORO network for evaluation.

On this page