Bayes Theorem and Naive Bayes

One thing I love to do to practice machine learning is code an algorithm myself. I mean without external packages, just trying to implement all the math and everything else myself.

Naive Bayes is a great algorithm for this, because it’s relatively conceptually easy, and it gets pretty decent results. But before we can talk about Naive Bayes, we have to talk about conditional probability.

Bayes Theorem

To understand most statistics, we need to understand conditional probability. Bayes theorem states that

latex-image-1

where P(A|B) is the probability that A is true, given that B is true. Most analysis these days is Bayesian, as it is a way to inject practical subjectivity into a model.

For example, what is the probability someone will have heart problems? Well, if we take into account whether or not the person smokes, drinks, exercises and eats healthy foods, we will get a better estimate. This is just a breif explanation of Bayes Theorem and if this is your first time seeing it, you can learn more here.

Naive Bayes

Naive Bayes is an algorithm using basic probability in order to classify. Let’s define a few probabilities in the context of an example so it’s more clear. In this example we are classifying students as accepted or rejected to some school (the data is from UCLA!).

The prior – The probability of the class. In this case, if A = accepted, then

latex-image-2

where N_A = number of accepted students in the data, and N = number of total students in the data.

Likelihood – This is a conditional probability. If we assume some student (some vector x) is accepted, then we can calculate the probability of him having a gpa as low or as high as he did, given he was accepted.

If we have to calculate this for real values, we use a Normal distribution. If we are doing it for binary values, a Bernoulli distribution, and for categorical—multinoulli. We can use different distributions for different variables, each within our model.

Posterior – The probability of the prior and the likelihood, normalized. In math this looks like the prior x the likelihood, divided by the probability of the vector x. When we’re writing a Naive Bayes algorithm however, we don’t care about the normalizing, or the dividing by the probability of vector x. This is because it is constant and has no impact no our classification. So the denominator in both of these equations—ignore it.

latex-image-3

latex-image-4

When calculating the likelihood, we have to do it for each feature. We can simply multiply the probabilities, because the model assumes conditional indepence, a strong, likely false assumption. This is why the algorithm is called “naive”.

Example

Which students get accepted to UCLA?

We have two classes – accepted and rejected. We have three variables — gre score, gpa, and rank. We can use each of these to write a classifier.

Use

Notes –

I’m using R. I chose it over Python for this one because it felt so statistical, but you could just as easily do this in Python.

I’ll give the important statistical ideas that lie in the model right here, but to do to this yourself or to see every line of code, go to my github.

‘train’ is the training data, which I randomly sampled by class.

Prior

First we need to calculate the prior probabilities for each class. My favorite way to do this is a subset. If you want to save memory, compress this into two lines.

Screen Shot 2015-11-08 at 2.27.59 PM

note – sorry it’s a picture and not code! I really dislike WordPress for code snippets.

Likelihood

Next, we need to calculate the likelihood. This is a bit more complicated, as we have to use a Gaussian distribution for each variable. I made a dictionary to keep my code concise, rather than define 3 different means and standard deviations. I recommend you explore dictionaries in R a bit.

Screen Shot 2015-11-08 at 2.36.05 PM

I go on to test this function and make sure it works properly. It does.

Posterior

And lastly, we need to predict for each value in the test data! We do this simply by taking the maximum posterior probability.

Screen Shot 2015-11-08 at 2.36.05 PM

Then, we just test the accuracy. You’ll have to look at my github for that code.

Predictions

One last thing – It’s been a while since I’ve posted; I’ve been really busy with school. I’ll try to write more in the next few months! 🙂

Advertisements

4 comments

    • Thanks for the comments!

      I have looked into it, with no success, obviously. Each time I do I come to the conclusion I have no control over my PHP, but I’ll check out what you your link and try it out.

      Thanks again!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s