pi 0.6, a VLA that learns from experience

Blog Link: https://www.pi.website/blog/pistar06

Imitation is not enough because VLAs make small mistakes, and these mistakes will result in a different environment than the training data; These errors will compound and lead to failure

It’s easy to get VLAs to succeed at a task some of the time, it’s hard to make them succeed reliably

Coaching Corrections

A method pi introduces in their blog is to coach corrections, when the arm is making mistakes have a teleoperator take over and show it how to do it properly

Supervision is not optimal:

Dependent on human’s ability to identify the right time to intervene
Provide high-quality corrections

Credit Assignment

The biggest problem with Reinforcement Learning is credit assignment; In long horizon tasks how do we know which actions are good and which ones are bad

Pi trained their own credit assignment model to solve this

Ayush Garg

Recently Updated

Pareto Principle

Bits

Magnitude of a normalized floating-point number

Mixed Precision Training

pi 0.6, a VLA that learns from experience

Coaching Corrections

Credit Assignment

Graph View

Table of Contents

Backlinks

Ayush Garg

Recently Updated

Pareto Principle

Bits

Magnitude of a normalized floating-point number

Mixed Precision Training

pi 0.6, a VLA that learns from experience

Coaching Corrections §

Credit Assignment §

Graph View

Table of Contents

Backlinks

Coaching Corrections

Credit Assignment