Naive Bayes Classifier

Ifeoma Veronica Nwabufo
3 min readMay 15, 2022

This is a type classifier that is mostly used as a baseline for text classification. The classifier uses the Bayes Theorem, hence it is a probabilistic model. It is called Naive Bayes because the order of words in a sentence does not impact the result of the classifier. For example the sentences:

The boy is kind, not so?

and

The boy is not so kind?

are considered the same by Naive Bayes. But we know that the first sentence is not the same as the second sentence! This property of the Naive Bayes model is called the independence property of Naive Bayes.

Though Naive Bayes does not consider the order of words, it considers the multiplicities of the words in a sentence. For example:

what a great day

is not the same as

what a great great day.

Now, let us look at Bayes Theorem.

Bayes Theorem

Let A and B be events. Then we define the following conditional probabilities:

Conditional Probabilities for events A and B

Note: P(A∩B) is called the joint probability of A and B.

Since equations (1) and (2) have common numerators (P(A∩B)), we can put them together and have:

Bayes Theorem

Equation (3) is Bayes Theorem where,

P(A) is the prior probability probability before carrying out a test;

P(B) is the marginal probability — the probability of the evidence;

P(B|A) is the likelihood of B given A — the probability of the evidence given that the probability of A is true; and

P(A|B) is the posterior probability — the probability of A after the evidence has been seen.

In the Naive Bayes text classification problem, we are interested in finding the probability of a label given the words. Hence we can rewrite equation (3) in terms of label and the words:

Bayes Theorem in terms of label and words

Simplifying further, we have:

Naive Bayes Classifier

We want the model to be easy to estimate so from equation (5), we make a naive assumption that the probability of a word given a class is independent of previous words and then we have:

Naive assumption that gives the Naive Bayes theorem

This assumption is what makes us call the model naive as we know that this assumption is not true in many cases because the probability of one word given a class may be dependent on the probability of another word given that class.

Simple Example

Suppose we have 5 sentences from a movie review as follows:

Table 1: Movie reviews

And we want to predict the review of a new sentence:

Table 2: New Review to Predict

We can construct a table to summarize the reviews from Table 1 as follows:

Table 3: Summary of Movies Review Table

And calculate the most probable review as follows:

From the above, we see that the review with more probability is the positive review.

The above tutorial is inspired by the NLP courses taught at the African Masters in Machine Intelligence programme by Armand Joulin and Edouard Grave of Facebook AI Research.

--

--