Blog Link: https://www.pi.website/blog/pistar06

Imitation is not enough because VLAs make small mistakes, and these mistakes will result in a different environment than the training data; These errors will compound and lead to failure

It’s easy to get VLAs to succeed at a task some of the time, it’s hard to make them succeed reliably

Coaching Corrections

A method pi introduces in their blog is to coach corrections, when the arm is making mistakes have a teleoperator take over and show it how to do it properly

Supervision is not optimal:

  • Dependent on human’s ability to identify the right time to intervene
  • Provide high-quality corrections

Credit Assignment

The biggest problem with Reinforcement Learning is credit assignment; In long horizon tasks how do we know which actions are good and which ones are bad

Pi trained their own credit assignment model to solve this