Behavior cloning loss is the supervised-learning objective used to train a policy to imitate expert demonstrations

Given a dataset of expert demonstrations:

behavior cloning trains a policy to predict the expert action from the state .

Discrete Actions

For discrete actions, BC loss is usually cross entropy / negative log likelihood:

This is the same objective as training a classifier to put high probability on the expert action.

Continuous Actions

For continuous actions, BC loss is often mean squared error between the predicted action and the expert action:

  • = action predicted by the learned policy
  • = expert action
  • = state or observation

Intuition

Behavior cloning treats imitation learning as supervised learning:

  • input: state / observation
  • label: expert action
  • model: policy
  • loss: how different the policy’s action is from the expert’s action

Limitation

BC only learns from the states seen in the expert dataset. If the policy makes a mistake and enters an unfamiliar state, errors can compound because the model may not know how to recover.

This is called distribution shift or covariate shift.