Naive Bayes algorithm: Prior, likelihood and marginal likelihood


Introduction

Based on Bayes' theorem, the naive Bayes algorithm is a probabilistic classification technique. It is predicated on the idea that a feature's presence in a class is unrelated to the presence of other features. Applications for this technique include text categorization, sentiment analysis, spam filtering, and picture recognition, among many others. A key concept in probability theory, the Bayes theorem provides a method for calculating the likelihood of an event given the chance of related events. Conditional probability, or the possibility of an event happening in the presence of another occurrence, serves as the theoretical foundation.

Prior, likelihood and marginal likelihood

According to the Bayes theorem, the likelihood of a hypothesis (H) given evidence (E) is equal to the likelihood of the evidence given the hypothesis times the likelihood of the hypothesis itself. The following is how it is expressed mathematically −

P(H|E) = P(E|H) * P(H) / P(E)

where P(E) is the marginal likelihood, P(H|E) is the posterior probability of the hypothesis given the evidence, and P(E|H) is the likelihood of the evidence given the hypothesis.

Regarding the independence of characteristics, the Naive Bayes method involves various assumptions. Given the class, it is assumed that the characteristics are independent of one another. The method can nonetheless deliver meaningful results even though this assumption is not always accurate in practice.

Prior probability

The probability of each class before any characteristics are observed is known as the prior probability in the Naive Bayes method. The prior probability of class A, for instance, would be the likelihood that an item belongs to class A before witnessing any characteristics in a binary classification problem with classes A and B. The likelihood that an item belongs to class B before viewing any features would be known as the prior probability of class B.

One may frequently estimate the prior probability from the training data by calculating the percentage of items in each class. The prior probability for class A and class B would be 0.6 and 0.4, respectively, if there are 100 objects in the training data and 60 of them belong to class A and 40 to class B.

Likelihood

In the Naive Bayes method, the likelihood is the likelihood of detecting each feature given the class. The likelihood of feature X1 given class A would be the chance of detecting feature X1 in objects belonging to class A, for instance, if there are two features, X1 and X2, and two classes, A and B. The chance of witnessing feature X2 in objects belonging to class B would be the likelihood of feature X2 given class B.

Typically, the likelihood is calculated using the training data by counting the number of objects in each class that possess each feature. For instance, if there are 60 objects in class A and 30 of them exhibit feature X1, the probability that feature X1 exists given class A is 0.5.

Marginal likelihood

The probability of witnessing the evidence is known as the marginal likelihood in the Naive Bayes method. The set of features that have been seen for an item is considered evidence in the Naive Bayes method. The evidence would be "X1, not X2," for instance, if there are two characteristics, X1 and X2, and an item possesses X1 but not X2.

Typically, the marginal likelihood is determined by adding the prior probability and the likelihood for each class. For instance, if there are two classes, A and B, and the prior probabilities are 0.6 and 0.4, respectively, and the likelihoods for features X1 and X2 are 0.5 and 0.2 and 0.3 and 0.8, respectively, for classes A and B, then the marginal likelihood of witnessing X1 rather than X2 would be −

P({X1, not X2}) = P(X1|A)P(not X2|A)P(A) + P(X1|B)P(not X2|B)P(B)

= 0.50.70.6 + 0.20.20.4

= 0.19

Posterior probability

The likelihood of each class given the evidence is known as the posterior probability in the Naive Bayes algorithm. By employing the prior probability, likelihood, and marginal likelihood in combination with Bayes' theorem, it is determined. As the anticipated class for the item, the highest posterior probability class is selected.

The posterior probabilities, for instance, might be as follows if there are two classes, A and B, the evidence is X1, not X2, and the prior probabilities and likelihoods are as determined above.

P(A|{X1, not X2}) = P({X1, not X2}|A)P(A) / P({X1, not X2}) = 0.50.70.6 / 0.19 = 0.92

P(B|{X1, not X2}) = P({X1, not X2}|B)P(B) / P({X1, not X2}) = 0.20.20.4 / 0.19 = 0.42

Therefore, the predicted class for the object would be A.

Conclusion

In conclusion, the Naive Bayes method is a probabilistic classification system that determines the posterior probability of each class given the available data using Bayes' theorem. The likelihood is the likelihood of witnessing each feature given to the class, the prior probability is the probability of each class before viewing any characteristics, and the marginal likelihood is the likelihood of observing the evidence. The Naive Bayes approach operates on the presumption that the qualities, given the class, are unrelated to one another. Notwithstanding this assumption, the Naive Bayes approach is widely used and frequently effective in real-world scenarios.

Updated on: 13-Apr-2023

942 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements