Test Your Robots on CraftBench#

Once you have trained a navigation policy in UrbanVerse, you can evaluate it on UrbanVerse-CraftBench — a curated collection of artist-designed, high-quality urban test scenes. These scenes contain clean geometry, consistent scale, and rich visual detail, making them ideal for benchmarking policy generalization.

UrbanVerse provides a unified evaluation API for navigation policies, producing:

  • Success Rate (SR)

  • Route Completion (RC)

  • Collision Times (CT)

  • Distance to Goal (DTG)

All evaluation APIs follow the same high-level convention:

import urbanverse as uv

Prerequisite#

Before running CraftBench evaluation, ensure you have set:

export URBANVERSE_CRAFTBENCH_ROOT="/path/to/UrbanVerse-CraftBench"

This directory should contain the 10 high-quality CraftBench scenes (each a folder containing a scene.usd and a flythrough.mp4).

Loading CraftBench Scenes#

You can load CraftBench scenes using:

API#

uv.navigation.eval.load_craftbench_scenes(
    scene_root: str | None = None,
) -> list[str]

If scene_root is None, it defaults to $URBANVERSE_CRAFTBENCH_ROOT.

Example#

import urbanverse as uv

craft_scenes = uv.navigation.eval.load_craftbench_scenes()
print("Loaded CraftBench scenes:", len(craft_scenes))
print(craft_scenes[:2])

Loading a Trained Policy#

UrbanVerse saves policy checkpoints in the directory specified during training (e.g., outputs/coco_nav).

You can load a trained PPO policy using:

API#

uv.navigation.eval.load_policy(
    checkpoint_path: str,
    device: str = "cuda",
) -> "Policy"

Example#

import urbanverse as uv

policy = uv.navigation.eval.load_policy(
    checkpoint_path="outputs/coco_nav/checkpoints/epoch_1500.pt"
)

Evaluation API#

The core evaluation function is:

API#

uv.navigation.eval.evaluate_navigation_policy(
    policy,
    scene_paths: list[str],
    robot_type: str,
    num_episodes_per_scene: int = 10,
    max_episode_seconds: float = 30.0,
    goal_tolerance: float = 0.5,
    device: str = "cuda",
) -> dict

The returned dictionary contains aggregated metrics:

{
    "SR": 0.78,         # Success Rate
    "RC": 0.85,         # Route Completion
    "CT": 0.42,         # Collision Times (avg)
    "DTG": 1.87,        # Distance to Goal (meters)
    "per_scene": { ... }  # detailed metrics
}

Metric Definitions#

Success Rate (SR): Fraction of episodes where the robot reaches the goal within the tolerance.

Route Completion (RC): Average fraction of the path completed before termination.

Collision Times (CT): Mean number of collision events per episode.

Distance to Goal (DTG): Final Euclidean distance to the goal when the episode ends.

End-to-End Example#

Below is a full example demonstrating loading CraftBench scenes, loading a trained policy, and running evaluation.

import urbanverse as uv

# 1. Load CraftBench scenes
craft_scenes = uv.navigation.eval.load_craftbench_scenes()

# 2. Load a trained navigation policy
policy = uv.navigation.eval.load_policy(
    checkpoint_path="outputs/coco_nav/checkpoints/epoch_1500.pt"
)

# 3. Run evaluation
results = uv.navigation.eval.evaluate_navigation_policy(
    policy=policy,
    scene_paths=craft_scenes,
    robot_type="coco_wheeled",
    num_episodes_per_scene=10,    # evaluate 10 trajectories per test scene
    goal_tolerance=0.5,
    max_episode_seconds=30.0,
)

print("Overall CraftBench Evaluation:")
print(results)

Output Example#

{
    "SR": 0.72,
    "RC": 0.81,
    "CT": 0.56,
    "DTG": 1.94,
    "per_scene": {
        "CraftBench_0001": {"SR": 0.7, "RC": 0.84, "CT": 0.4, "DTG": 1.8},
        "CraftBench_0002": {"SR": 0.8, "RC": 0.90, "CT": 0.3, "DTG": 1.2},
        ...
    }
}

Summary#

This page demonstrates how to:

  • load CraftBench scenes,

  • load a trained RL policy,

  • run an UrbanVerse-compliant navigation evaluator,

  • compute SR, RC, CT, and DTG metrics.

CraftBench serves as UrbanVerse’s standardized, challenging generalization benchmark, allowing you to objectively measure real-world readiness of your trained navigation policies.