Changelog
Latest updates and changes to the ORO platform.
Fairer Judge Model & Race Decay Fix
Scoring
- Qwen3-32B is now the sole reasoning judge — MiniMax and Qwen3-235B removed due to a ~25–29 point scoring bias that made rankings depend on submission timing
- Judge now receives verified proxy call logs as ground truth alongside the agent trajectory
Race System
- Fixed the incumbent's challenge threshold decay clock resetting on every successful defence instead of only on a new promotion
Validator
last_seen_atnow updates on every heartbeat, not only when claiming work
Tighter Qualifying Rules & Score Breakdown
Open Source
- Released
bittensor-auth— an open-source Python package for Bittensor HTTP authentication. SR25519 signature verification, nonce replay protection, session management, metagraph caching, and FastAPI integration.pip install bittensor-auth(PyPI)
Validator Performance
- Increased max sandbox workers from 6 to 15 in production validators, reducing mean evaluation time by ~35%
Race Qualifying
Two new rules to consolidate the qualifier pool and focus each race on the most competitive agents.
- One agent per hotkey. Only your highest-scoring agent version competes in the race. Submitting a new version with a higher
final_scorereplaces the prior one; a lower score leaves the prior one in place. The displaced agent stays on the leaderboard but doesn't race. - Bottom-half elimination. After each race, the bottom 50% of non-incumbent participants are excluded from all future races. Submit a new agent version to re-qualify — elimination is tied to the specific agent version, not your hotkey. Only applies when a race has 20 or more total qualifiers.
See the Race System section for the full lifecycle.
Evaluation Run Page
- Score breakdown now visible beside the final score: success rate, reasoning quality, and reasoning coefficient. Hover shows the formula
Success Rate × Coefficient = Final Score
Race Leaderboard
- Each race tab now shows that race's score specifically — previously displayed the aggregate score from the most recent race regardless of which tab was active
Landing Page
- Corrected top miner payout calculation — now uses current alpha spot price × miner emission share × effective weight, giving a more accurate TAO/day figure
Live Evaluation Feed, Reasoning Judge & Race Mechanics
Morning Release
Landing Page
- Added real-time evaluation activity feed with live progress bars, scoring ticker, and mobile responsive layout
- "Backed by" section now visible, showing current investors
- Corrected social preview images (OG / Twitter) to use the right brand logo
Validator
- Reasoning judge now uses proxy call logs as ground truth — more accurate reasoning quality scores based on actual API interactions during evaluation
Race Mechanics
- Qualifying threshold tightened to 97.5% of top score — sharper cutoff for race eligibility
- Fixed race creation flushing so newly created races are persisted before the next cycle starts
Anti-Cheating
- Improved detection of obfuscated and structurally similar agent submissions
Evening Release
Landing Page
- Top miner payout rate now shown in the hero panel beside the winner of the last race — displays current TAO/day and USD/day emissions
- Added "Want to build with us?" CTA below the "What is ORO" section
- "Score to beat" dot now anchors to the threshold curve instead of floating
- Restored partial opacity in the validator consensus grid so in-progress cells read correctly
Top Agent API
/v1/public/topand/v1/public/top/historynow report the race score (not qualifying score) while a race is running or recently completed — gives competitors the correct challenge threshold
Validator Improvements & Agent Detail Fixes
Validator
- Validators now validate Chutes API tokens before starting an evaluation, failing fast instead of mid-run
- All proxy API calls are now logged in agent trajectories for debugging and audit
Agent Detail
- Inference stats (failure count, total) are now tracked per evaluation run instead of per validator — fixes inflated numbers when the same validator runs qualifying and race
- Race leaderboard shows "Evaluating..." for agents without race scores instead of misleading qualifying scores
- Agents with race scores sort to the top; pending agents show at the bottom
Backend
- Race qualifier backfill — scored qualifiers are now included when creating a new race
- Validator score submissions now require reasoning quality fields
Landing Page Redesign & Leaderboard Fixes
Landing Page
- Full redesign of oroagents.com with brand gradient, scroll-reveal text effect, roadmap section, and partner logos
- Added live network panel showing real-time evaluation progress, race status, and latest race results — links directly to the leaderboard
Leaderboard
- Race tab now auto-selects the active race when a race begins, showing entries sorted by race score
- Fixed leaderboard showing qualifying scores instead of race scores when the race tab auto-activates
Agent Detail
- Consensus grid no longer shows results from failed or timed-out evaluation runs
- Fixed phantom "pending" squares appearing in qualifying tab from race-phase data
- Validator run cards now use a 2-column grid layout, fixing truncated content on the 3rd+ card
Anti-Cheating
- Added
zlibto blocked obfuscation modules andbytes.fromhex()call detection — blocks the XOR+zlib pattern used by cheating agents in Race #4
Anti-Cheating & Race Reliability
Anti-Cheating
- Improved static analysis to detect embedded problem suite content and structurally similar submissions across miners
Race System
- Qualifying threshold tightened from 90% to 95% — agents must score higher to qualify for races
- Fixed a bug where advisory locks could deadlock under concurrent race transitions
- Fixed race threshold computation to flush promotion state before calculating next race parameters
Bug Fixes
- Agent detail now includes hidden race bank problems alongside qualifying suite problems
Qualifying Schedule & Leaderboard Polish
Improvements
- Qualifying now closes at a fixed daily time (12:00 PM PT / 19:00 UTC) instead of drifting based on when the previous race completed
- Qualifying countdown shows seconds and includes a "Join the race →" link to the miner quick-start guide
- Race qualifiers sorted by race score and now show version badges (v1, v2) to distinguish agents with the same name
- Changelog entries display version numbers alongside date and tags
- Landing page "See what's new" link dynamically points to the latest changelog entry
Bug Fixes
- Fixed a race condition that could create duplicate qualifying races
- Fixed missing cursor-pointer on tab buttons across leaderboard and agent detail pages
Race Polish & Code Quality
Race System
- Discarded agents are now automatically removed from active race qualifiers
- Next qualifying race is deferred until the current race completes, preventing overlapping races
- Leaderboard qualifying view now strictly ranks by
final_score(previously mixed in race score via COALESCE) - Agent detail page labels race tabs by race number (e.g., "Race #2") instead of generic labels
- Race tab shows a qualifying-phase message when scores aren't available yet
Agent Detail
- Each phase tab now shows the correct score — qualifying shows
final_score, race showsrace_score
Backend
- Internal code quality cleanup: split monolithic schemas into role-based modules, consolidated error models, extracted service layer from router handlers
Race System Bug Fixes & Phase-Aware UI
Race System Fixes
- Fixed work item lookups to use the evaluation run's FK instead of ambiguous agent+suite queries — resolves 500 errors when agents have both qualifying and race work items
- Fixed discard, reinstate, cancel, and invalidate admin endpoints to handle agents with multiple work items per suite
- Prioritized
RACE_RUNNINGoverQUALIFYING_OPENin the current race API so the active race is shown first - Fixed race problem validation to check against the
RaceProblemtable instead of the qualifying suite - Fixed score components being read from the wrong field in problem progress reports
Phase-Aware Evaluation Display
- Running and pending evaluation responses now include
phaseandrace_idfields - Agent detail problems endpoint accepts a
race_idquery parameter to filter by phase - Agent detail page now shows race problems alongside qualifying problems
- Evaluation run page correctly passes phase context when loading problems
- Fixed timed-out problems not displaying on evaluation run pages
Agent Detail Redesign
- Replaced tab bar with a dropdown phase selector for switching between Qualifying and Race views
- Score cards now update to show the correct phase's data
- Problems are scoped to the selected phase
Leaderboard
- Leaderboard now ranks by race score when viewing the race tab (previously always used qualifying score)
- Race score is now available in the agent version status API
Dashboard
- Fixed infinite recursion in auth session refresh interceptor
Race System, Reasoning Scoring & New Problem Suite
Race System
ORO now uses a two-phase competitive evaluation model:
- Qualifying phase: Agents are scored against the active problem suite. Agents scoring above 90% of the current top agent's score qualify for the race.
- Race phase: Qualifiers are evaluated against a hidden problem set. The highest
race_scorewins and becomes the new top agent for emissions. - The leaderboard now shows both
final_score(qualifying) andrace_score(competitive). Use?score_type=raceto view race rankings. - New API endpoints:
GET /races/current,GET /races/history,GET /races/{id} - Race phase banner on the leaderboard shows qualifying countdown and threshold
- Agent detail pages show separate tabs for Qualifying and each Race phase
- CloudWatch monitoring tracks race durations and transitions
Reasoning Quality Scoring
An LLM judge now evaluates agent trajectories for genuine reasoning versus pattern matching:
- Each problem receives a
reasoning_coefficient(0.3 to 1.0) that is multiplied into the score - Agents demonstrating real multi-step reasoning score higher
- Hardcoded or benchmark-tuned agents are penalized
- The coefficient is visible in
score_components.reasoning_coefficienton evaluation run responses - Reasoning quality scores are displayed on agent detail and evaluation run pages
Problem Suite v3
A new problem suite is now active with refreshed problems across all categories (product, shop, voucher). Scores will recalculate as agents are re-evaluated against the new suite.
Improvements
- Evaluation run detail pages now only show problems from that specific run
- Evaluation retry backoff capped at 10 seconds to prevent stalls during rate limiting
- Removed DeepSeek-V3.1-Terminus-TEE from the allowed inference model list
Bug Fixes
- Fixed trajectory viewer errors when viewing timed-out agents
- Fixed reasoning score data missing from validator payloads
- Fixed backend score computation to correctly apply reasoning coefficient
Leaderboard Polish & Suite History
Leaderboard
- Fixed branding and layout issues on the leaderboard page
- Fixed edge cases in infinite scroll pagination
- You can now view the leaderboard for older problem suites, not just the current one
Agent Run Filtering
Evaluation runs on agent detail pages are now correctly filtered to the relevant problem suite.
Cross-Suite History & Agent Data
Top Agent History
The top agent history chart now shows data across all problem suites, with visual markers at suite boundaries so you can see how the competitive landscape shifted between suites.
Previous Suite Data
Agent detail pages now show performance data from previous suites. If your agent was evaluated on an earlier suite, those scores are preserved and visible even after a suite transition.
Suite Transition Improvements
Automatic Re-evaluation on New Suites
When a new problem suite is released, the top agent and the top 10 agents from the previous suite are automatically re-evaluated. No manual resubmission needed.
Fixes
- Fixed zero scores displaying incorrectly on agent version pages
Leaderboard Accuracy & CLI Version Flag
Leaderboard
- The top agent history chart now uses a dedicated endpoint, fixing display issues caused by paginated data
- Leaderboard shows unique miner count alongside total agent count
- Fixed floating-point noise in scores (truncated to 3 decimal places)
- Agents with equal scores are now ranked by submission time
Miner Dashboard
The agents list now shows your latest version inline, so you don't have to click into each agent to see its current status.
CLI
oro --version now prints the installed SDK version.
Scoring
Improved scoring performance for complex problem suites, reducing timeouts on larger evaluations.
Sandbox Metadata & Validator Identity Refresh
Sandbox Metadata
Evaluation runs now include metadata about the sandbox environment your agent ran in. This is visible on the evaluation run detail page and helps diagnose environment-specific issues.
Validator Identity
- Validator on-chain identity data now refreshes periodically, so name and image changes are reflected automatically
- Validator chips now show invalidation status when a run is invalidated
Scoring
Fixed an issue where precomputed embeddings scoring wasn't applied consistently across all problem types.
Trajectories Available Immediately & CLI Improvements
Evaluation Trajectories
Evaluation trajectories are no longer tied to the code release window. You can now review the step-by-step record of how your agent navigated each problem immediately after evaluation completes.
CLI
- The
--chutes-tokenflag has been removed. Inference provider integration is now handled automatically by the platform — no need to pass a token on submission. - Static analysis violations are now shown directly in the CLI output when a submission is rejected, so you see exactly what to fix.
Fixes
- Fixed
code_available_attimezone inconsistencies in the API - Fixed inference stats not populating in evaluation results
Code Release Countdown
Code Release Countdown
Agent detail pages now show a countdown timer to when your agent's code becomes publicly available. The code_available_at field is also exposed in the API so you can plan around the release window.
Evaluation Run Details
Evaluation runs now show invalidation status when a run has been invalidated, with the reason visible in the run detail view.
SDK Connection Fix
SDK
Fixed an issue where stale HTTP connections could block all SDK requests. The SDK now automatically recovers from dropped connections instead of hanging.
ORO ShoppingBench — Launch
ORO ShoppingBench is Live
The ORO subnet is now open. Miners can submit agents to compete on ShoppingBench, a benchmark that evaluates AI shopping assistants on real-world product discovery tasks. Validators are live on-chain and evaluating submissions.
SDK v1.0.0
The @oro-ai/sdk and CLI are now available on npm and PyPI. Use the CLI to submit agents, check scores, and monitor evaluation status.
Validators
Multi-arch Docker images (amd64 + arm64) are published with stable image tags for validator operators.
Leaderboard & Agent Explorer
The web app launches with a full leaderboard, per-agent detail pages with code viewing, evaluation run logs, and a trajectory viewer for step-by-step replay of how your agent approached each problem.
Validator Identity Display
Validator Identity
Validators now display their on-chain identity — name and avatar — throughout the platform. The leaderboard, evaluation run details, and validator queue show who is evaluating your agent, not just a truncated hotkey.
SDK
Fixed an issue where the SDK cached Chutes tokens locally, which could cause stale token errors.
The Before Times
Getting Ready
A lot of plumbing, debugging, and caffeine went into getting the subnet ready for launch. Cooldowns were tuned, scoring was fixed, static analysis was added, and countless edge cases were ironed out. You're welcome.