Behavior Cloning Loss

Behavior cloning loss is the supervised-learning objective used to train a policy to imitate expert demonstrations

Given a dataset of expert demonstrations:

behavior cloning trains a policy to predict the expert action from the state .

Discrete Actions

For discrete actions, BC loss is usually cross entropy / negative log likelihood:

This is the same objective as training a classifier to put high probability on the expert action.

Continuous Actions

For continuous actions, BC loss is often mean squared error between the predicted action and the expert action:

= action predicted by the learned policy
= expert action
= state or observation

Intuition

Behavior cloning treats imitation learning as supervised learning:

input: state / observation
label: expert action
model: policy
loss: how different the policy’s action is from the expert’s action

Limitation

BC only learns from the states seen in the expert dataset. If the policy makes a mistake and enters an unfamiliar state, errors can compound because the model may not know how to recover.

This is called distribution shift or covariate shift.

Ayush Garg

Recently Updated

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels