End-to-End Example#
This section provides a complete, working example that demonstrates the full imitation learning workflow in UrbanVerse: from collecting demonstrations to training a BC policy and evaluating its performance.
Example Scenario#
We’ll train a behavior cloning policy for a COCO wheeled robot to navigate in UrbanVerse-160 scenes. The workflow includes:
Collecting 20 expert demonstrations in Cape Town scenes
Training a BC policy on the collected data
Evaluating the trained policy on Tokyo scenes (unseen during training)
Analyzing the results
Step 1: Collect Expert Demonstrations#
First, we collect demonstrations by teleoperating the robot:
import urbanverse as uv
import os
# Set up scene paths
scene_root = os.environ.get("URBANVERSE_SCENE_ROOT", "/path/to/UrbanVerse-160")
scene_paths = [
f"{scene_root}/Africa_SouthAfrica_CapeTown_0001/scene.usd",
f"{scene_root}/Africa_SouthAfrica_CapeTown_0002/scene.usd",
f"{scene_root}/Africa_SouthAfrica_CapeTown_0003/scene.usd",
]
# Collect demonstrations
print("Starting data collection...")
print("Use your gamepad to control the robot. Navigate to the goal positions.")
print("Press 'A' button to end each episode and start the next one.")
demo_dir = uv.navigation.il.collect_data(
scene_paths=scene_paths,
robot_type="coco_wheeled",
output_dir="demos/capetown_bc_demos",
control_mode="teleop_gamepad",
max_episodes=20,
episode_length=300,
goal_sampling="random",
)
print(f"✓ Collected {20} demonstration episodes")
print(f"✓ Data saved to: {demo_dir}")
Tips for Data Collection: - Navigate smoothly and efficiently to goals - Include some recovery behaviors (e.g., backing up from obstacles) - Try to demonstrate diverse navigation scenarios - Aim for a mix of short and long-range navigation
Step 2: Train Behavior Cloning Policy#
Now we train a BC policy on the collected demonstrations:
import urbanverse as uv
# Configure training
train_cfg = {
"learning_rate": 1e-4,
"batch_size": 256,
"train_epochs": 50,
"weight_decay": 1e-5,
"validation_split": 0.1,
"early_stopping": True,
"data_augmentation": {
"color_jitter": True,
},
}
print("Starting BC training...")
checkpoint_path = uv.navigation.il.train_bc(
demo_dir="demos/capetown_bc_demos",
robot_type="coco_wheeled",
output_dir="outputs/bc_coco_capetown",
train_cfg=train_cfg,
)
print(f"✓ Training complete!")
print(f"✓ Best model saved to: {checkpoint_path}")
Training Output: The training process will display progress information: - Training loss and validation loss per epoch - Training time and estimated completion - Best model checkpoint path
You can monitor training progress using TensorBoard:
tensorboard --logdir outputs/bc_coco_capetown/logs/tensorboard
Step 3: Evaluate the Trained Policy#
Evaluate the BC policy on unseen Tokyo scenes to test generalization:
import urbanverse as uv
# Load the trained policy
policy = uv.navigation.il.load_bc_policy(
checkpoint_path="outputs/bc_coco_capetown/checkpoints/best.pt",
robot_type="coco_wheeled",
)
# Prepare test scenes (different from training scenes)
scene_root = os.environ.get("URBANVERSE_SCENE_ROOT")
test_scenes = [
f"{scene_root}/Asia_Japan_Tokyo_0001/scene.usd",
f"{scene_root}/Asia_Japan_Tokyo_0002/scene.usd",
]
# Evaluate
print("Evaluating BC policy on Tokyo scenes...")
results = uv.navigation.il.evaluate(
policy=policy,
scene_paths=test_scenes,
robot_type="coco_wheeled",
num_episodes=50,
max_episode_steps=300,
)
# Print results
print("\n" + "="*50)
print("Evaluation Results")
print("="*50)
print(f"Success Rate (SR): {results['SR']:.2%}")
print(f"Route Completion (RC): {results['RC']:.2%}")
print(f"Collision Times (CT): {results['CT']:.2f}")
print(f"Distance-to-Goal (DTG): {results['DTG']:.2f} m")
print("="*50)
Expected Results: - BC policies typically achieve 50-70% success rate on similar scenes - Performance may drop to 30-50% on significantly different scenes (cross-city evaluation) - Route completion is usually higher than success rate (policies often get close but don’t reach the goal)
Step 4: Deploy the Policy#
Use the trained policy for autonomous navigation:
import urbanverse as uv
from urbanverse.navigation.config import EnvCfg, SceneCfg
# Load policy
policy = uv.navigation.il.load_bc_policy(
checkpoint_path="outputs/bc_coco_capetown/checkpoints/best.pt",
robot_type="coco_wheeled",
)
# Create environment
cfg = EnvCfg(
scenes=SceneCfg(
scene_paths=["/path/to/UrbanVerse-160/CapeTown_0001/scene.usd"],
async_sim=False,
),
robot_type="coco_wheeled",
...
)
env = uv.navigation.rl.create_env(cfg)
# Run policy
obs = env.reset()
for step in range(300):
action = policy(obs) # BC policy prediction
obs, reward, done, info = env.step(action)
if done:
print(f"Episode ended: {info.get('outcome', 'unknown')}")
obs = env.reset()
Complete Script#
Here’s the complete end-to-end script combining all steps:
"""
Complete BC Training and Evaluation Example
"""
import urbanverse as uv
import os
# Configuration
SCENE_ROOT = os.environ.get("URBANVERSE_SCENE_ROOT", "/path/to/UrbanVerse-160")
ROBOT_TYPE = "coco_wheeled"
# Step 1: Collect demonstrations
print("="*60)
print("Step 1: Collecting Expert Demonstrations")
print("="*60)
train_scenes = [
f"{SCENE_ROOT}/Africa_SouthAfrica_CapeTown_{i:04d}/scene.usd"
for i in range(1, 4)
]
demo_dir = uv.navigation.il.collect_data(
scene_paths=train_scenes,
robot_type=ROBOT_TYPE,
output_dir="demos/capetown_bc",
control_mode="teleop_gamepad",
max_episodes=20,
)
# Step 2: Train BC policy
print("\n" + "="*60)
print("Step 2: Training BC Policy")
print("="*60)
checkpoint = uv.navigation.il.train_bc(
demo_dir=demo_dir,
robot_type=ROBOT_TYPE,
output_dir="outputs/bc_coco_capetown",
)
# Step 3: Evaluate
print("\n" + "="*60)
print("Step 3: Evaluating BC Policy")
print("="*60)
policy = uv.navigation.il.load_bc_policy(
checkpoint_path=checkpoint,
robot_type=ROBOT_TYPE,
)
test_scenes = [
f"{SCENE_ROOT}/Asia_Japan_Tokyo_{i:04d}/scene.usd"
for i in range(1, 3)
]
results = uv.navigation.il.evaluate(
policy=policy,
scene_paths=test_scenes,
robot_type=ROBOT_TYPE,
num_episodes=50,
)
# Step 4: Print summary
print("\n" + "="*60)
print("Final Results")
print("="*60)
print(f"Success Rate: {results['SR']:.2%}")
print(f"Route Completion: {results['RC']:.2%}")
print(f"Collision Times: {results['CT']:.2f}")
print(f"Distance-to-Goal: {results['DTG']:.2f} m")
print("="*60)
Running the Example#
Save the script above as train_bc_example.py and run:
# Set environment variable
export URBANVERSE_SCENE_ROOT=/path/to/UrbanVerse-160
# Run the example
python train_bc_example.py
The script will guide you through data collection (you’ll need to teleoperate the robot), then automatically train and evaluate the policy.
Next Steps#
After training a BC policy, you can:
Fine-tune with RL: Use the BC policy as initialization for reinforcement learning (see Reinforcement Learning in UrbanVerse)
Collect more data: Expand your demonstration dataset to improve performance
Evaluate on CraftBench: Test your policy on the artist-crafted test scenes
Deploy in simulation: Use the policy for autonomous navigation tasks
The next section covers evaluation in more detail, including metrics interpretation and best practices for assessing BC policy performance.