Unfolding Robotics: The Open-Source Recipe for Teaching a Robot to Fold Your Clothes

Blog Link: https://huggingface.co/spaces/lerobot/robot-folding

Tips for good data collection

Practice before you record. Consistent, deliberate demonstrations are more valuable than hesitant or inconsistent ones
Quality over speed. High-quality task execution is more valuable than fast, sloppy ones
Be consistent within episodes. The model learns a coherent strategy more easily than movements that vary wildly each time
Start small, then extend. Train a quick model, see what fails, then add diversity. Don’t try to collect the perfect dataset on day one
Speed after quality. Once you’ve dialed in the quality and a consistent strategy, optimize for speed. But never sacrifice quality for it
Watch your setup, not just your data. If rig vibrates or frustrates operators, fix that before collecting more

Architecture

pi 0.5 a Vision-Language-Action Model with Open-World Generalization

Real-Time Chunking (RTC)

Training Recipe

Training ran on 8xH100 GPUs with per-gpu batch size of 32, used gradient accumulation, and using AdamW with a learning rate of (warmup + cosine decay)

Large batch size is important for stable VLA training and drives multi-gpu requirement

Evaluation

Evals are very important because if your evals aren’t good every decision you make based off them will be wrong

Metrics

Metrics they gave for folding clothes:

Success Rate; Binary pass/fail per rollout
Score; Partial credit based on subtasks completed
Fold Quality; A 1-5 rating of the final fold appearance, averaged across successful rollouts
Completion Time; Seconds to complete Level 1/Level 2, averaged across successful rollouts

Data to Deployment

Started simply by training pi 0 and pi 0.5 on the full dataset 5,688 episodes for 200k steps each (took ~27 hours on 8xH100s)

Models could sometimes fold a laid-out shirt, but they were slow and produced poor-quality folds. They suspected the problem was in the data, different operators used different grip points and strategies for unspreading the shirt

Improving the data

Removed demonstrations that didn’t show a properly folded shirt, if end result isn’t good demonstration isn’t useful
Length-based filtering using lerobot data visualizer to remove outliers. Short episodes tend to be low quality

They trained a SARM, how do you measure “progress” in a long, multi-stage task like t-shirt folding

They annotated every episode in both datasets using SARM, this gave them continuous per-timestep quality scores that they could use in 2 ways: for data curation and for reward-weighted training

Ayush Garg

Recently Updated

Accelerating Generative AI with PyTorch II - GPT, Fast

Stochastic Differential Equation

piRL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Stochastic

Unfolding Robotics: The Open-Source Recipe for Teaching a Robot to Fold Your Clothes

Tips for good data collection

Architecture

Real-Time Chunking (RTC)

Training Recipe

Evaluation

Metrics

Data to Deployment

Improving the data

Graph View

Table of Contents

Backlinks

Ayush Garg

Recently Updated

Accelerating Generative AI with PyTorch II - GPT, Fast

Stochastic Differential Equation

piRL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Stochastic

Unfolding Robotics: The Open-Source Recipe for Teaching a Robot to Fold Your Clothes

Tips for good data collection §

Architecture §

Real-Time Chunking (RTC) §

Training Recipe §

Evaluation §

Metrics §

Data to Deployment §

Improving the data §

Graph View

Table of Contents

Backlinks

Tips for good data collection

Architecture

Real-Time Chunking (RTC)

Training Recipe

Evaluation

Metrics

Data to Deployment

Improving the data