Warm-up steps are used to increase the learning rate from a very small value to the target learning rate during the first part of training
Warm-up steps are used to increase the learning rate from a very small value to the target learning rate during the first part of training