Naïve Bayes is a surprisingly powerful yet simple algorithm for predictive modeling. It is a machine learning model that can handle large volumes of data, including millions of data records. Naïve Bayes is the recommended approach for Natural Language Processing tasks such as sentiment analysis as the results are excellent. It is an uncomplicated and fast classification algorithm. In machine learning, the data has to be prepared for the algorithm, the model learns from the training data, and this model can be used to make predictions. This article is a **machine learning for beginners** article, assuming that readers are not from a statistical background but know a little about probability. The aim is to cover the basics of Bayes theorem, the representation used by Naïve Bayes when storing the model, and more.

**Table of Contents**

- Bayes Theorem
- Naïve Bayes and its types
- Applications
- Takeaway

As a part of **machine learning training**, it is essential to understand the Bayes Theorem before learning about the Naïve Bayes Classifier and its use in Machine Learning.

**Bayes Theorem**

When we are given data in machine learning, let us denote it by d; our interest lies in selecting the best hypothesis denoted by h. For classification, the new data instance (d) needs to be assigned to a class, our hypothesis (h). The easiest way of selecting the most probable hypothesis using the given data depends on the knowledge about the problem we have already. Given our prior knowledge, we can calculate the probability of a hypothesis using the Bayes’ Theorem. It is a conditional probability and is states as

P(h|d) = (P(d|h) * P(h)) / P(d)

Where

- P(h|d), called the posterior probability, is the probability of hypothesis h given the data d.
- P(d|h) given that the hypothesis h was true, is the probability of data d
- P(h), called the prior probability of h, is the probability of hypothesis h being true (irrespective of the data)
- P(d), regardless of the hypothesis, is the probability of the data.

Our interest lies in calculating P(h|d)- the posterior probability when we are given with the prior probability p(h) with P(D) and P(d|h).

In simple words, Bayes’ Theorem gives the probability of event B, given that event A has already occurred.

We can select the hypothesis with the highest probability after the posterior probability has been calculated for several different hypotheses. This is called the maximum probable hypothesis and is formally referred to as MAP- maximum a posteriori hypothesis. It can be written as: MAP(h) = max(P(h|d)) or MAP(h) = max(P(d|h) * P(h))

or MAP(h) = max((P(d|h) * P(h)) / P(d)).

P(d) is a normalizing term that lets us calculate the probability. It can be dropped when we are interested in the most probable hypothesis, and it is only used to normalize and is constant. Coming to classification, if there are even multiple instances in each of our classes of training data, the probability of each class will be the same. This would be a constant term in the equation, and it can be dropped in the end, leaving us with: MAP(h) = max(P(d|h)). It is essential to know all these forms of the theorem as it may develop further reading.

**Naïve Bayes and its types**

A classification technique based on Bayes’ Theorem when the predictors are assumed to be independent, is called the Naïve Bayes Classifier. In other words, the classifier assumes that the presence of two particular features in a class is unrelated. For instance, the fruit is considered an apple if it is round, red, and has a three inches diameter. Even if these features are dependent on each other or upon the existence of other features, the properties independently contribute to the probability of the fruit being an apple. This is why the algorithm is known as ‘Naive.’ As mentioned earlier, the algorithm is easy to build and useful for massive data sets. The simplicity of the algorithm outperforms the highly sophisticated classification methods. The three types of Naïve Bayes Model are:

**Gaussian Naïve Bayes**– This model assumes that features follow a normal distribution; if predictors take continuous value instead of discrete, then the values are assumed by the model to be sampled from the Gaussian distribution.**Multinomial Naïve Bayes**– This type of classifier is used when the data is multinomial distributed; it comes handy in document classification problems, generally when a particular document belongs to a category like Politics, education, sports, etc. The frequency of words is used for the prediction in classification.**Bernoulli**– This classifier works similarly to the Multinomial classifier. However, the predictor variables are independent Boolean variables such as if a particular word is present or not in the document. This classification model is also famous for document classification.

**Applications**

Following are some of the application of Naïve Bayes in Machine Learning:

**Face Recognition**– The classifier can identify the face and other features like mouth, eyes, nose, etc.**Real-Time prediction**– It is an eager learning classifier and is fast. Thus it can be used for making predictions in real-time.**Weather Prediction**– A time series prediction to classify the weather as good or bad.**Multiclass Prediction**– The probability of multiple classes of target variable can be predicted by this algorithm easily**Medical diagnosis**– By using the information that the classifier provides, doctors can diagnose patients. Naïve Bayes is used by healthcare professionals to indicate if a patient is at high risk for specific conditions and diseases such as cancer, heart attack, etc.**Recommendation System**– Collaborative Filtering and Naïve Bayes classifier together can be used for building a recommendation system that uses data mining techniques and machine learning to filter unseen information and make predictions if a user would be interested in a resource or not**Sentiment Analysis**– Naïve Bayes is heavily employed in text classification, spam filtering, and news classification.

**Takeaway**

This blog discussed the Naïve Bayes algorithm, its use in classification, and other machine learning applications. As an outcome of the **machine learning course**, one should know about Bayes Theorem, its calculation in practice, representation, how to train the model for making predictions after data preparation, and Gaussian Naïve Bayes (the adaption of Naïve Bayes for real-valued input data). If you wish to learn all this and accelerate your career as a **machine learning expert**, sign up for **machine learning certification**.