The concept of taking numeric values and grouping them into finite set of categories so you can treat them as discrete labels instead of continuous numbers

Pros:

  • Simplify analysis/visualization
  • Reduce noise
  • Make models easier to interpret
  • Handle non-linear relationships Cons:
  • Information loss = values collapse into the same bin
  • Boundary sensitivity
  • Too many bins = noisy, sparse counts
  • Too few bins = oversimplified, hides structure