Softplus function is a smooth version of the ReLU activation function

It is commonly used as an Activation Function in Neural Networks.

Intuition

Softplus behaves like a smoothed-out version of ReLU:

For large positive values of , dominates, so:

For large negative values of , is close to , so:

So Softplus is close to for negative inputs and close to for positive inputs, but it transitions smoothly instead of having a sharp corner at .

Derivative

The derivative of Softplus is the sigmoid function:

This matters because Softplus is differentiable everywhere, unlike ReLU, which has a corner at .

Properties

  • Domain: all real numbers
  • Range:
  • Smooth and differentiable everywhere
  • Always positive
  • Approaches as
  • Approaches as

Why Use It

Softplus can be useful when you want the behavior of ReLU but need a smooth function. This can make optimization easier in cases where sharp corners cause issues.

One tradeoff is that Softplus is more expensive to compute than ReLU because it uses an exponential and logarithm.

Numerically Stable Form

For large , directly computing can overflow. A more stable form is: