Naïve Bayes
There is nothing naïve about Naïve Bayes—a very basic, but extremely efficient data mining method to take decisions when a vast amount of data is available. The name comes from the fact that this is the simplest application to this problem, upon (the naïve) assumption of independence of the events. It is based on Bayes’ rule of conditional probability: If you have a hypothesis and evidence that bears on that hypothesis, then
where as usual, denotes the probability of the event and denotes the probability of the event conditional to another event
I would like to show an example of this technique, of course, with yet another decision-making algorithm oriented to guess my reaction to a movie I have not seen before. From the data obtained in a previous post, I create a simpler table with only those movies that have been scored more than 28 times (by a pool of 87 of the most popular critics featured in www.metacritics.com) [I posted the script to create that table at the end of the post]
Let’s test it:
>>> len(table)
49
>>> [entry[0] for entry in table]
[‘rabbit-hole’, ‘carnage-2011’, ‘star-wars-episode-iii—revenge-of-the-sith’,
‘shame’, ‘brokeback-mountain’, ‘drive’, ‘sideways’, ‘salt’,
‘million-dollar-baby’, ‘a-separation’, ‘dark-shadows’,
‘the-lord-of-the-rings-the-return-of-the-king’, ‘true-grit’, ‘inception’,
‘hereafter’, ‘master-and-commander-the-far-side-of-the-world’, ‘batman-begins’,
‘harry-potter-and-the-deathly-hallows-part-2’, ‘the-artist’, ‘the-fighter’,
‘larry-crowne’, ‘the-hunger-games’, ‘the-descendants’, ‘midnight-in-paris’,
‘moneyball’, ‘8-mile’, ‘the-departed’, ‘war-horse’,
‘the-lord-of-the-rings-the-fellowship-of-the-ring’, ‘j-edgar’,
‘the-kings-speech’, ‘super-8’, ‘robin-hood’, ‘american-splendor’, ‘hugo’,
‘eternal-sunshine-of-the-spotless-mind’, ‘the-lovely-bones’, ‘the-tree-of-life’,
‘the-pianist’, ‘the-ides-of-march’, ‘the-quiet-american’, ‘alexander’,
‘lost-in-translation’, ‘seabiscuit’, ‘catch-me-if-you-can’, ‘the-avengers-2012’,
‘the-social-network’, ‘closer’, ‘the-girl-with-the-dragon-tattoo-2011’]
>>> table[0]
[‘rabbit-hole’, ”, ‘B+’, ‘B’, ”, ‘C’, ‘C+’, ”, ‘F’, ‘B+’, ‘F’, ‘C’, ‘F’, ‘D’,
”, ”, ‘A’, ”, ”, ”, ”, ‘B+’, ‘C+’, ”, ”, ”, ”, ”, ”, ‘C+’, ”, ”,
”, ”, ”, ”, ‘A’, ”, ”, ”, ”, ”, ‘A’, ”, ”, ‘B+’, ‘B+’, ‘B’, ”, ”,
”, ‘D’, ‘B+’, ”, ”, ‘C+’, ”, ”, ”, ”, ”, ”, ‘B+’, ”, ”, ”, ”, ”,
”, ‘A’, ”, ”, ”, ”, ”, ”, ”, ‘D’, ”, ”,’C+’, ‘A’, ”, ”, ”, ‘C+’, ”]
Note the titles of the movies: I have only seen some of them, and I proceed to score those ones below, with the same grading scheme:
'star-wars-episode-iii---revenge-of-the-sith' | B |
'brokeback-mountain' | B |
'salt' | B |
'million-dollar-baby' | A |
'the-lord-of-the-rings-the-return-of-the-king' | C+ |
'true-grit' | B+ |
'inception' | A |
'batman-begins' | A |
'harry-potter-and-the-deathly-hallows-part-2' | D |
'the-fighter' | B+ |
'the-hunger-games' | C+ |
'the-descendants' | C+ |
'moneyball' | B+ |
'8-mile' | B |
'war-horse' | B+ |
'the-lord-of-the-rings-the-fellowship-of-the-ring' | B |
'the-kings-speech' | B |
'super-8' | A |
'robin-hood' | B |
'eternal-sunshine-of-the-spotless-mind' | C+ |
'the-tree-of-life' | F |
'the-pianist' | B+ |
'lost-in-translation' | B+ |
'catch-me-if-you-can' | B |
'the-avengers-2012' | C+ |
'the-girl-with-the-dragon-tattoo-2011' | B+ |
I will call this list my “evidence” and denote it With this list as training data, let us focus on a movie I have not watched yet, say 'hugo'. If I want to compute the probability of me scoring this movie an A, I need to do that conditional to the evidence, so that by Bayes’ rule, it should be:
But the evidence is the combination of individual scores of a bunch of movies; if we assume that these pieces of evidence are independent, then their combined probability is obtained by multiplying the probabilities like so:
We need not to worry about the probability This gets eliminated in a final normalizing step when we make all the probabilities add up to one:
All the probabilities in the product on the right-hand side of the previous equation are computed from the table in an obvious way (by counting!) For instance, to compute we go over the data on table regarding both movies:
[‘salt’, ‘A’, ‘C+’, ”, ‘C’, ‘F’, ‘C+’, ”, ”, ‘F’, ‘F’, ”, ‘B’, ‘B’, ‘B’, ”,
‘B’, ”, ”, ”, ”, ‘D’, ‘C+’, ”, ”, ‘C’, ”, ”, ”, ‘C+’, ”, ”, ”, ”,
”, ”, ‘F’, ”, ”, ”, ”, ‘D’, ‘C+’, ”, ”, ‘F’, ”, ‘F’, ”, ”, ”, ‘A’,
”, ”, ‘D+’, ‘F’, ‘D+’, ”, ‘D’, ”, ”, ”, ‘F’, ”, ”, ”, ”, ”, ”, ‘B’,
”, ”, ”, ”, ‘F’, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”, ”]
>>> table[34]
[‘hugo’, ‘A’, ‘B+’, ‘F’, ‘A’, ”, ”, ”, ‘F’, ”, ‘F’, ”, ”, ‘C’, ‘A’, ”, ‘A’,
‘A’, ”, ”, ”, ‘B+’, ‘C+’, ”, ”, ”, ‘A’, ”, ”, ‘C+’, ”, ”, ”, ”, ”,
”, ‘B’, ”, ”, ”, ”, ‘A’, ”, ”, ”, ‘B+’, ”, ‘B’, ‘A’, ”, ”, ”, ‘A’,
”, ‘A’, ”, ”, ”, ‘C+’, ”, ”, ”, ‘C+’, ”, ”, ”, ”, ”, ”, ‘A’, ”, ”,
”, ”, ‘D’, ‘A’, ‘A’, ”, ”, ”, ‘B+’, ”, ”, ”, ”, ”, ”]
>>> len([[table[7][i],table[34][i]] for i in range(len(table[7])) \
if table[7][i]==”B” and table[34][i]==”A”])
3
Of the 87 critics, only 3 granted 'salt' a B, while giving 'hugo' an A. This means that, according to this scheme, Simple, right?
'salt'= | A | B+ | B | C+ | C | D+ | D | F | no-score | TOTAL |
'hugo'=A | 1 | 0 | 3 | 0 | 1 | 1 | 1 | 0 | 6 | 13 |
Note that an occurrence of a zero in one of these probabilities makes our algorithm go “boom.” To solve this issue, we apply a Laplace estimator: We always add one to the numerators of the probabilities, and compensate adding as much as necessary in the denominators. In this case, since we have 9 different classes (A, B+, B, C+, C, D+, D, F and no-score) we add nine to the denominators. We would have then
Similarly, for the prior probability (the one in which we don’t know any piece of evidence) we have (no need to use a Laplace estimator here)
This is now a fully Bayesian formulation where prior probabilities have been assigned to everything in sight. It is completely rigorous, albeit the fact that, indeed, in our case independence might not be guaranteed.
At this point, I leave as an exercise the coding of this algorithm. According to this scheme, what will be the score for the movie 'hugo' with the highest probability? This is what this data mining idea implies I would think after seeing it. Also, how do the results of this algorithm compare to those of my weighted averages?
Python Script to create a table with all movies with at least N scores
# retrieve the data: # the list critics with the names of the critics, and # the corresponding list scoredMovies with dicts. # Each dict in position j contains as keys the names of the movies scored # and their scores by the featured critic in position j from the critics # list from movieData import * # This is how to create a list with all the different movies scored def collectMovies(scoredMovies): allMovies=set() for movies in scoredMovies: allMovies = allMovies | set(movies.keys()) return list(allMovies) # The following function retrieves all movies with at least N scores def moviesWithMoreThanNScores(scoredMovies,N): allMovies = collectMovies(scoredMovies) truefalseMovieHelper = map( lambda movie: map(lambda criticWork: criticWork.has_key(movie),scoredMovies), allMovies) movieCount = map( lambda x:x.count(True), truefalseMovieHelper) return [movie for index,movie in enumerate(allMovies) if movieCount[index]>=N] # prepTable gives us a table with those movies with at least N scores, # together with those scores (87 in total). If the critic in position "j" # happened not to score a movie, the corresponding "j" entry for that movie # is an empty string. Note that we turn numerical values into grades # (A, B+, B, C+, C, D+, D and F) def prepTable(scoredMovies,N): def transformScore(criticScore,movie): if movie in criticScore.keys(): score = criticScore[movie] if score<60: return "F" elif score<65: return "D" elif score<70: return "D+" elif score<75: return "C" elif score<80: return "C+" elif score<85: return "B" elif score<90: return "B+" else: return "A" else: return "" tableMovies = moviesWithMoreThanNScores(scoredMovies,N) table = [] for movie in tableMovies: row=[movie] for criticScore in scoredMovies: row.append(transformScore(criticScore,movie)) table.append(row) return table
We have moved!
In the news:
Recent Posts
- Migration
- Computational Geometry in Python
- Searching (again!?) for the SS Central America
- Jotto (5-letter Mastermind) in the NAO robot
- Robot stories
- Advanced Problem #18
- Book presentation at the USC Python Users Group
- Areas of Mathematics
- More on Lindenmayer Systems
- Some results related to the Feuerbach Point
- An Automatic Geometric Proof
- Sympy should suffice
- A nice application of Fatou’s Lemma
- Have a child, plant a tree, write a book
- Project Euler with Julia
- Seked
- Nezumi San
- Ruthless Thieves Stealing a Roll of Cloth
- Which one is the fake?
- Stones, balances, matrices
- Buy my book!
- Trigonometry
- Naïve Bayes
- Math still not the answer
- Sometimes Math is not the answer
- What if?
- Edge detection: The Convolution Approach
- OpArt
- So you want to be an Applied Mathematician
- Smallest Groups with Two Eyes
- The ultimate metapuzzle
- Where are the powers of two?
- Geolocation
- Boundary operators
- The Cantor Pairing Function
- El País’ weekly challenge
- Math Genealogy Project
- Basic Statistics in sage
- A Homework on the Web System
- Apollonian gaskets and circle inversion fractals
- Toying with basic fractals
- Unusual dice
- Wavelets in sage
- Edge detection: The Scale Space Theory
- Bertrand Paradox
- Voronoi mosaics
- Image Processing with numpy, scipy and matplotlibs in sage
- Super-Resolution Micrograph Reconstruction by Nonlocal-Means Applied to HAADF-STEM
- The Nonlocal-means Algorithm
- The hunt for a Bellman Function.
- Presentation: Hilbert Transform Pairs of Wavelets
- Presentation: The Dual-Tree Complex Wavelet Transform
- Presentation: Curvelets and Approximation Theory
- Poster: Curvelets vs. Wavelets (Mathematical Models of Natural Images)
- Wavelet Coefficients
- Modeling the Impact of Ebola and Bushmeat Hunting on Western Lowland Gorillas
- Triangulations
- Mechanical Geometry Theorem Proving
Pages
- About me
- Books
- Curriculum Vitae
- Research
- Teaching
- Mathematical Imaging
- Introduction to the Theory of Distributions
- An Introduction to Algebraic Topology
- The Basic Practice of Statistics
- MA598R: Measure Theory
- MA122—Fall 2014
- MA141—Fall 2014
- MA142—Summer II 2012
- MA241—Spring 2014
- MA242—Fall 2013
- Past Sections
- MA122—Spring 2012
- MA122—Spring 2013
- Lesson Plan—section 007
- Lesson Plan—section 008
- Review for First part (section 007)
- Review for First part (section 008)
- Review for Second part (section 007)
- Review for Third part (section 007)
- Review for the Second part (section 008)
- Review for the Fourth part (section 007)
- Review for Third and Fourth parts (section 008)
- MA122—Fall 2013
- MA141—Spring 2010
- MA141—Fall 2012
- MA141—Spring 2013
- MA141—Fall 2013
- MA141—Spring 2014
- MA141—Summer 2014
- MA142—Fall 2011
- MA142—Spring 2012
- MA241—Fall 2011
- MA241—Fall 2012
- MA241—Spring 2013
- MA242—Fall 2012
- MA242—Spring 2012
- First Midterm Practice Test
- Second Midterm-Practice Test
- Third Midterm—Practice Test
- Review for the fourth part of the course
- Blake Rollins’ code in Java
- Ronen Rappaport’s project: messing with strings
- Sam Somani’s project: Understanding Black-Scholes
- Christina Papadimitriou’s project: Diffusion and Reaction in Catalysts
- Problem Solving
- Borsuk-Ulam and Fixed Point Theorems
- The Cantor Set
- The Jordan Curve Theorem
- My oldest plays the piano!
- How many hands did Ernie shake?
- A geometric fallacy
- What is the next number?
- Remainders
- Probability and Divisibility by 11
- Convex triangle-square polygons
- Thieves!
- Metapuzzles
- What day of the week?
- Exact Expression
- Chess puzzles
- Points on a plane
- Sequence of right triangles
- Sums of terms from Fibonacci
- Alleys
- Arithmetic Expressions
- Three circles
- Pick a point
- Bertrand Paradox
- Unusual dice
- El País’ weekly challenge
- Project Euler with Julia
- LaTeX
Categories
Archives
- November 2014
- September 2014
- August 2014
- July 2014
- June 2014
- March 2014
- December 2013
- October 2013
- September 2013
- July 2013
- June 2013
- April 2013
- January 2013
- December 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- November 2011
- September 2011
- August 2011
- June 2011
- May 2011
- April 2011
- February 2011
- January 2011
- December 2010
- May 2010
- April 2010
- September 2008
- September 2007
- August 2007
Math updates on arXiv.org
- On the image of the total power operation for Burnside rings
- A note on hidden classes in spinor classification
- Modularity of certain products of the Rogers-Ramanujan continued fraction
- Complex Analytic Structure of Stationary Flows of an Ideal Incompressible Fluid
- Learning the local density of states of a bilayer moir\'e material in one dimension
- Hypergeometric Distribution Revisited: Tail Inequalities, Confidence Bounds and Sample Sizes
- Positive formula for the product of conjugacy classes on the unitary group
- Neural Estimation Of Entropic Optimal Transport
- Some Homological Conjectures Over Idealization Rings
- On kernels of homological representations of mapping class groups
sagemath
- An error has occurred; the feed is probably down. Try again later.