Train Your Robots in UrbanVerse#
UrbanVerse provides a complete pipeline for training goal-point navigation policies in large-scale, physically grounded urban simulation scenes.
This page shows how to:
load UrbanVerse scenes (built-in or custom),
configure a robot (using
coco_wheeledas example),set PPO and reward configurations,
and launch training via a Python module entrypoint and the
uv.navigation.rlAPI.
All APIs follow the pattern:
import urbanverse as uv
Prerequisites#
Before training, make sure your scene roots are configured (see the Scene Caching section):
export URBANVERSE_SCENE_ROOT="/path/to/UrbanVerse-160" # Reconstructed + user-generated scenes
export URBANVERSE_CRAFTBENCH_ROOT="/path/to/UrbanVerse-CraftBench" # Artist-designed CraftBench test scenes
Launch Training from CLI#
UrbanVerse exposes a navigation training entrypoint as a Python module:
python -m uv.navigation.rl.train \
--robot coco_wheeled \
--scene_mode urbanverse160 \
--num_scenes 32 \
--num_envs 32 \
--output_dir outputs/coco_nav
This will:
use the COCO wheeled robot,
load 32 scenes from
$URBANVERSE_SCENE_ROOT(UrbanVerse-160),run 32 parallel environments,
and train a PPO-based navigation policy, saving logs and checkpoints under
outputs/coco_nav.
Suggested CLI arguments#
--robot: robot type (e.g.,coco_wheeled).--scene_mode:urbanverse160β load scenes from$URBANVERSE_SCENE_ROOT,customβ use a user-provided list of USD paths.
--scene_paths_file(forscene_mode=custom): Path to a text/JSON file with a list of USD scene paths.--num_scenes: number of scenes to sample (e.g., 32).--num_envs: number of parallel environments.--output_dir: output directory for checkpoints and logs.
Internally, this module calls the high-level API
uv.navigation.rl.train_navigation_policy() described below.
Selecting Scenes for Training#
UrbanVerse supports two scene selection modes.
Option A β Custom list of USD scene paths#
Use this if you want to train on a curated subset of UrbanVerse-160 or
your own generated scenes stored under $URBANVERSE_SCENE_ROOT.
API#
uv.navigation.rl.build_scene_list(
scene_paths: list[str],
) -> list[str]
Example#
import os
import urbanverse as uv
scene_paths = uv.navigation.rl.build_scene_list([
f"{os.environ['URBANVERSE_SCENE_ROOT']}/CapeTown_0001/scene.usd",
f"{os.environ['URBANVERSE_SCENE_ROOT']}/Tokyo_0005/scene.usd",
f"{os.environ['URBANVERSE_SCENE_ROOT']}/London_0007/scene.usd",
])
print("Loaded scenes:", len(scene_paths))
Option B β Use UrbanVerse-160 scenes#
If you prefer to directly use the built-in UrbanVerse-160 library,
UrbanVerse will load scenes from $URBANVERSE_SCENE_ROOT.
API#
uv.navigation.rl.load_urbanverse160_scenes(
num_scenes: int = 32,
split: str = "train",
scene_root: str | None = None,
) -> list[str]
If
scene_rootisNone, it defaults to$URBANVERSE_SCENE_ROOT.splitcan be used to implement your own train/val/test partitions.
Example#
import urbanverse as uv
scene_paths = uv.navigation.rl.load_urbanverse160_scenes(
num_scenes=32,
split="train",
)
print("Example scenes:", scene_paths[:3])
PPO Training Configuration#
UrbanVerse uses PPO with the hyperparameters from the UrbanVerse paper:
learning rate: 1e-4
discount factor: 0.99
GAE: Ξ» = 0.95
PPO clip: 0.2
horizon length: 32
minibatch size: 512
number of mini-epochs: 5
KL threshold: 0.01
entropy coefficient: 0.002
gradient norm clipping: 1.0
1500 training epochs
API#
uv.navigation.rl.NavTrainConfig(
learning_rate: float = 1e-4,
gamma: float = 0.99,
gae_lambda: float = 0.95,
ppo_clip: float = 0.2,
kl_threshold: float = 0.01,
entropy_coef: float = 0.002,
critic_coef: float = 1.0,
max_grad_norm: float = 1.0,
horizon_length: int = 32,
minibatch_size: int = 512,
mini_epochs: int = 5,
bounds_loss_coef: float = 0.01,
training_epochs: int = 1500,
scenes_per_batch: int = 16,
scene_repeat_min: int = 4,
scene_repeat_max: int = 6,
scene_resample_interval_episodes: int = 100,
device: str = "cuda",
mixed_precision: bool = True,
)
Example#
import urbanverse as uv
train_cfg = uv.navigation.rl.NavTrainConfig(
learning_rate=1e-4,
minibatch_size=512,
training_epochs=1500,
)
Reward Configuration#
The reward follows the definition in the UrbanVerse paper:
Arrival reward \(R_A\) +2000 when the agent reaches the goal.
Collision penalty \(R_C\) β200 on collision with obstacles.
Position tracking \(R_P\) Gaussian shaping based on position error:
coarse: std = 5.0, weight = 10,
fine: std = 1.0, weight = 50.
Velocity reward \(R_V\) weight = 10, based on cosine similarity between current and target velocity (from robot to goal).
API#
uv.navigation.rl.NavRewardConfig(
arrived_reward: float = 2000.0,
collision_penalty: float = -200.0,
pos_std_coarse: float = 5.0,
pos_weight_coarse: float = 10.0,
pos_std_fine: float = 1.0,
pos_weight_fine: float = 50.0,
vel_weight: float = 10.0,
)
Example#
import urbanverse as uv
reward_cfg = uv.navigation.rl.NavRewardConfig(
arrived_reward=2000,
collision_penalty=-200,
)
High-Level Training API#
The main programmatic entrypoint for training is:
API#
uv.navigation.rl.train_navigation_policy(
robot_type: str,
scene_paths: list[str] | None = None,
use_urbanverse160: bool = False,
num_scenes: int = 32,
task_cfg: "NavTaskConfig" | None = None,
train_cfg: "NavTrainConfig" | None = None,
reward_cfg: "NavRewardConfig" | None = None,
output_dir: str = "outputs/navigation",
) -> str
Behavior:
If
scene_pathsis provided β use exactly these scenes (Option A).If
scene_pathsisNoneanduse_urbanverse160=Trueβ load scenes from$URBANVERSE_SCENE_ROOTusingnum_scenes(Option B).
End-to-End Example (Python)#
The example below uses 32 scenes from UrbanVerse-160 and trains a COCO wheeled robot navigation policy:
import urbanverse as uv
# 1. Load 32 scenes from URBANVERSE_SCENE_ROOT (Option B)
scene_paths = uv.navigation.rl.load_urbanverse160_scenes(
num_scenes=32,
split="train",
)
# 2. Configure the navigation task
task_cfg = uv.navigation.rl.NavTaskConfig(
robot_type="coco_wheeled",
horizon_length=32,
num_envs=32,
goal_tolerance=0.5,
)
# 3. PPO config
train_cfg = uv.navigation.rl.NavTrainConfig(
training_epochs=1500,
minibatch_size=512,
)
# 4. Reward config
reward_cfg = uv.navigation.rl.NavRewardConfig()
# 5. Launch training
ckpt_dir = uv.navigation.rl.train_navigation_policy(
robot_type="coco_wheeled",
scene_paths=scene_paths, # explicit list (Option A-style)
use_urbanverse160=False, # scenes already passed explicitly
num_scenes=32,
task_cfg=task_cfg,
train_cfg=train_cfg,
reward_cfg=reward_cfg,
output_dir="outputs/coco_nav",
)
print("Checkpoints and logs saved to:", ckpt_dir)
Gymnasium Environment Registration (Advanced)#
Internally, UrbanVerse registers a Gymnasium environment for navigation:
import gymnasium as gym
env_id = f"URBANVERSE-navigation-{task_cfg.robot_type}"
gym.register(
id=env_id,
entry_point="urbanverse.navigation.rl:UrbanVerseNavEnv",
disable_env_checker=True,
kwargs={
"scene_paths": scene_paths,
"task_cfg": task_cfg,
"train_cfg": train_cfg,
"reward_cfg": reward_cfg,
},
)
env = gym.make(env_id)
Advanced users can replace the entry_point with their own environment
class, as long as it follows the same interface.
Configuration Classes (Conceptual Notes)#
The configuration objects:
uv.navigation.rl.NavTaskConfiguv.navigation.rl.NavTrainConfiguv.navigation.rl.NavRewardConfig
provide structured, declarative access to all relevant parameters: scenes, robot type, horizon length, PPO hyperparameters, and reward shaping.
They are intended to be lightweight config containers (e.g., implemented
as dataclasses or @configclass objects), making it straightforward to
modify and extend UrbanVerseβs navigation training pipeline.