Multinomial Naive Bayes Classifier | by Yoann Mocquin | Mar, 2024

A complete worked example for text-review classification

Yoann Mocquin
Towards Data Science

In this new post, we are going to try to understand how multinomial naive Bayes classifier works and provide examples with Python and scikit-learn.

What we’ll see:

  • What is the : As opposed to Gaussian Naive Bayes classifiers that rely on assumed Gaussian , multinomial naive Bayes classifiers rely on multinomial distribution.
  • The general to create classifiers that rely on Bayes theorem, together with the naive assumption that the input are of each other given a class.
  • How a multinomial classifier is “fitted” by /estimating the multinomial probabilities for each class — using the smoothing trick to handle empty features.
  • How the probabilities of a new sample are computed, using the log-space trick to avoid underflow.

All by author.

If you are already familiar with the multinomial distribution, you can move on to the next part.

Representation of 2 multinomial distributions (with 10 parameters). Those represent the probability that a given appears in a text review.

The first important step to understand the Multinomial Naive Bayes classifier is to understand what a multinomial distribution is.

In simple words, it represents the probabilities of an experiment that can have a finite number of outcomes and that is repeated N times, for example, like rolling a dice with 6 faces say 10 times and counting the number of times each face appears. Another example is counting the number of occurence each word in a vocabulary appear in a text.

You can also see the multinomial distribution as an extension of the binomial distribution: except for tossing a coin with 2 possible outcomes (binomial), you roll a dice with 6 outcomes (multinomial). As for the binomial distribution, the sum of all the probabilities of the possible outcomes must sum to 1. So we could have:

Source link