Categorial Policies

Categorial Policy is like a classifier over discrete actions.

You build the Neural Network for a categorial policy the same way you’d for a classifier

Log-Likelihood. Denote the last layer of probabilities as P_{\theta}(s). It is a vector with however many entries as there are actions, so we can treat the actions as indices for the vector. The log likelihood for an action a can then be obtained by indexing into the vector:

$lo g π_{θ} (a ∣ s) = lo g [P_{θ} (s)]_{a} .$

Ayush Garg

Recently Updated

Hi, I am Ayush 1️⃣9️⃣

Flash Attention

Hyperic

Obsidian

Categorial Policies

Graph View

Backlinks