Imitation Learning in UrbanVerse

Imitation Learning in UrbanVerse#

This chapter explains how to train Behavior Cloning (BC) policies for goal-directed navigation in UrbanVerse scenes. You can learn from expert demonstrations collected directly within UrbanVerse’s simulation environment, or import real-world teleoperated demonstrations mapped into UrbanVerse’s urban scenes.

UrbanVerse provides a unified framework for imitation learning, including consistent data formats, streamlined training APIs, and comprehensive evaluation tools that work seamlessly with UrbanVerse-160, CraftBench, and custom scenes.

Documentation Overview#

This documentation covers the complete imitation learning workflow:

  • Collecting expert demonstrations through teleoperation (keyboard, joystick, gamepad, VR) or importing external data

  • Understanding the demonstration dataset format and how observations and actions are structured

  • Configuring BC training with appropriate architectures, hyperparameters, and loss functions

  • Using UrbanVerse’s IL APIs for training, inference, and evaluation

  • Best practices for effective behavior cloning in urban navigation tasks

Basic Usage#

The following example demonstrates how to collect demonstrations and train a BC policy:

import urbanverse as uv

# 1. Collect expert demonstrations
demo_dir = uv.navigation.il.collect_data(
    scene_paths=["/path/to/UrbanVerse-160/CapeTown_0001/scene.usd"],
    robot_type="coco_wheeled",
    output_dir="demos/cape_town_teleop",
    control_mode="teleop_gamepad",
    max_episodes=20,
)

# 2. Train Behavior Cloning policy
checkpoint_path = uv.navigation.il.train_bc(
    demo_dir=demo_dir,
    robot_type="coco_wheeled",
    output_dir="outputs/bc_coco_capetown",
)

# 3. Load and use the policy
policy = uv.navigation.il.load_bc_policy(
    checkpoint_path=checkpoint_path,
    robot_type="coco_wheeled",
)

# 4. Evaluate the policy
results = uv.navigation.il.evaluate(
    policy=policy,
    scene_paths=["/path/to/UrbanVerse-160/Tokyo_0001/scene.usd"],
    robot_type="coco_wheeled",
    num_episodes=50,
)

print(f"Success Rate: {results['SR']:.2%}")
print(f"Route Completion: {results['RC']:.2%}")

Benefits of Imitation Learning#

Imitation learning provides several advantages for training navigation policies:

  • Fast bootstrapping: Learn from expert demonstrations without designing reward functions

  • Real-world data integration: Incorporate teleoperated demonstrations from actual robot deployments

  • Warm start for RL: Use BC policies as initialization for reinforcement learning, accelerating training

  • Human-in-the-loop: Leverage human expertise and intuition for complex navigation scenarios

UrbanVerse’s imitation learning framework is designed to work seamlessly with the same scenes, robots, and observation/action spaces used in reinforcement learning, making it easy to combine both approaches.