Data Collection

OpenArm is designed as a data-native platform. This guide covers everything from wiring cameras to recording episodes in LeRobot format, with quality checks at every stage.

Before Recording

Hardware Connections for Recording

Data collection requires more hardware than basic arm control. This section covers what to connect and where before you start LeRobot.

📷

Wrist Camera

Mount a USB webcam or Intel RealSense D435i to the end-effector flange. Connect via USB 3.0 for 30+ fps. Verify: ls /dev/video*

📖

Overhead / Workspace Camera

Fixed camera above the workspace for a global view. Mount at ~60 cm above the table, angled down 30°. Second USB 3.0 port.

🔌

CAN Bus (arm control)

Already connected from setup. Verify: ip link show can0. The CAN interface must be up before starting LeRobot.

👤

Teleop Device

3D SpaceMouse, a second OpenArm as a leader arm, or a gamepad. Leader-follower with two OpenArms gives the highest quality demonstrations.

Camera sync note: LeRobot timestamps all streams at the host PC level. For multi-camera setups, use USB 3.0 hubs (not USB 2.0 hubs) to minimize latency skew between camera frames and joint state readings. Target: < 5ms skew between streams.

Recording Workflow

Step-by-Step Recording Workflow

Follow these steps for each recording session. Each step builds on the last — do not skip steps.

1

Pre-session safety check

Clear the workspace (1 m radius), verify the arm reaches the home position freely, test E-stop before recording. See Safety page.

2

Bring up the CAN interface and ROS2

sudo ip link set up can0
source /opt/ros/humble/setup.bash
source ~/openarm_ws/install/setup.bash
ros2 launch openarm_ros2 openarm.launch.py use_fake_hardware:=false can_interface:=can0
3

Home the arm

Run the homing routine to set the reference position before each session. The arm must reach its home position with no load on the end-effector.

python3 -m openarm_can.scripts.home --interface can0
4

Verify camera feeds

Check that all cameras are streaming before starting LeRobot. A missing camera will silently corrupt your dataset if LeRobot does not report it.

# Quick camera check (press Q to exit)
python3 -c "
import cv2
for i in range(4):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f'Camera {i}: OK')
    cap.release()
"
5

Set up the task scene

Place objects in consistent starting positions. Consistent scene initialization is critical for policy generalization. Photograph or mark the starting configuration.

6

Start LeRobot recording

source ~/.venvs/openarm/bin/activate
python -m lerobot.scripts.control_robot \
  --robot.type=openarm \
  --control.type=record \
  --control.fps=30 \
  --control.repo_id=your-username/openarm-pick-place-v1 \
  --control.num_episodes=50 \
  --control.single_task="Pick up the red cube and place it in the bin" \
  --control.warmup_time_s=5 \
  --control.reset_time_s=10

LeRobot will prompt you before each episode. Use warmup_time_s to prepare your teleop position before recording starts.

7

Review and replay episodes

After recording, replay suspicious episodes before finalizing the dataset. Delete poor-quality episodes immediately.

python -m lerobot.scripts.visualize_dataset \
  --repo_id=your-username/openarm-pick-place-v1 \
  --episode_index=0
8

Push to HuggingFace Hub

huggingface-cli login
python -m lerobot.scripts.push_dataset_to_hub \
  --repo_id=your-username/openarm-pick-place-v1
Dataset Format

LeRobot Dataset Format

LeRobot stores datasets in the HuggingFace dataset format using Parquet files for tabular data and MP4/PNG files for image streams. Each episode is a sequence of timestamped observations and actions.

Directory structure

your-username/openarm-pick-place-v1/
├── meta/
│   ├── info.json          # Dataset metadata, fps, shapes
│   ├── episodes.jsonl     # Per-episode metadata (task, length, outcome)
│   └── stats.json         # Min/max/mean/std for all fields
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       ├── episode_000001.parquet
│       └── ...
└── videos/
    └── chunk-000/
        ├── observation.images.wrist_cam/
        │   ├── episode_000000.mp4
        │   └── ...
        └── observation.images.overhead_cam/
            └── ...

Episode data schema

Fields in each episode Parquet file
observation.state float32[8] Joint positions in radians (8 DOF)
observation.velocity float32[8] Joint velocities in rad/s
observation.effort float32[8] Joint torques in Nm
observation.images.* video path Reference to frame in MP4 video file
action float32[8] Target joint positions from teleop device
timestamp float64 Unix timestamp in seconds
frame_index int64 Frame number within episode
episode_index int64 Episode number within dataset
next.done bool True on the last frame of an episode
task_index int64 Index into task description lookup table

Failure episodes as data

OpenArm is designed to safely record failed attempts, not just successes. Failure trajectories — slippage, misgrasp, collision, recovery attempts — are first-class data critical for robust policy generalization. Do not delete failure episodes automatically. Instead, annotate them with the success field in episode metadata and let the training framework decide whether to use them.

Quality Assurance

Quality Checklist for Collected Data

Run through this checklist after each recording session and before pushing to the Hub. Poor-quality data hurts policy performance more than low episode count.

  • 1
    Episode length is consistent All episodes for the same task should be within ±30% of median length. Outliers usually indicate the operator paused, missed a grasp, or the recording was interrupted.
  • 2
    No missing camera frames Check that every episode has the expected number of frames per stream. Run lerobot.scripts.visualize_dataset on 3–5 episodes to verify video quality.
  • 3
    Joint positions are within safe limits Verify observation.state never exceeds the joint limits in specs. High-velocity spikes indicate a CAN dropout or control glitch — delete those episodes.
  • 4
    Task scene was reset between episodes Each episode must start with the object in the same initial position. If you skipped a reset, the policy will learn from inconsistent initial conditions and generalize poorly.
  • 5
    Camera coverage is complete The wrist camera should always show the end-effector and the object being manipulated. The overhead camera should show the full workspace. Re-adjust mounts if the object leaves frame mid-episode.
  • 6
    Demonstration style is consistent All operators should use the same approach path and grasp style. Mixed strategies confuse policy training. Use a single operator per task version, or label episodes by operator.
  • 7
    Dataset stats look reasonable Check meta/stats.json after recording. Verify action mean is near zero (not stuck at joint limits), and action std is large enough to show variation across episodes.
  • 8
    Success rate is documented Record the human success rate during collection. A 60–70% success rate is typical for contact-rich tasks. Lower success may indicate the task is too hard; higher may mean the task is too easy to provide useful training signal.
Next Step

Training a Policy from Your Dataset

Once your dataset passes the quality checklist, you can train ACT or Diffusion Policy directly with LeRobot.

Train ACT

python -m lerobot.scripts.train \
  --policy.type=act \
  --dataset.repo_id=your-username/openarm-pick-place-v1 \
  --policy.chunk_size=100 \
  --training.num_epochs=5000 \
  --output_dir=outputs/act-pick-place

Train Diffusion Policy

python -m lerobot.scripts.train \
  --policy.type=diffusion \
  --dataset.repo_id=your-username/openarm-pick-place-v1 \
  --training.num_epochs=8000 \
  --output_dir=outputs/diffusion-pick-place

Go deeper: Read the full Data Collection Pipeline Overview in the Robotics Library for a thorough treatment of episode structure, dataset versioning, sim-to-real alignment, and multi-task dataset composition.

Dataset Ready? Start Training.

Push your dataset to HuggingFace and start training ACT or Diffusion Policy.