Blog Link: https://www.pi.website/blog/pistar06
Imitation is not enough because VLAs make small mistakes, and these mistakes will result in a different environment than the training data; These errors will compound and lead to failure
It’s easy to get VLAs to succeed at a task some of the time, it’s hard to make them succeed reliably
Coaching Corrections
A method pi introduces in their blog is to coach corrections, when the arm is making mistakes have a teleoperator take over and show it how to do it properly
Supervision is not optimal:
- Dependent on human’s ability to identify the right time to intervene
- Provide high-quality corrections
Credit Assignment
The biggest problem with Reinforcement Learning is credit assignment; In long horizon tasks how do we know which actions are good and which ones are bad
Pi trained their own credit assignment model to solve this