Update 5: The monkeys recreated every work of Shakespeare and went viral. See the project project postmortem for my thoughts on going viral and what I learned during the project.
Update 6: I created a new visualization of the monkeys’ data.
Update 4: The monkeys recreated “A Lover’s Complaint”. Check out the write up.
Update 3: Welcome Slashdot, Fox News, Engadget and Japanese Engadget. So far, the monkeys have ran through 7.5 trillion 6.5 trillion 5 trillion (2011-09-22) 4 trillion (2011-09-16) 3 trillion (2011-09-10) 2.5 trillion (2011-09-07) 2 trillion (2011-09-05) 1.5 trillion (2011-09-01) 1 trillion (2011-08-28) 515,912,000,000 (2011-08-25) character groups.
In a recent post, I described a recent project to recreate Shakespeare using Hadoop and Amazon EC2. This time, I am going to recreate every work of Shakespeare randomly.
This project comes from one of my favorite Simpsons episodes which has a scene where Mr. Burns brings Homer to his mansion (YouTube Video). One of his rooms has a thousand monkeys at a thousand typewriters. One of the monkeys writes a slightly incorrect line from Charles Dickens “It was the best of times, it was blurst of times.” The joke is a play on the theory that a million monkeys sitting at a million typewriters will eventually produce Shakespeare. And that is what I did (am doing). I created millions of monkeys on Amazon and put them at virtual typewriters (aka Infinite Monkey Theorem).
Less Technical Explanation
Instead of having real monkeys typing on keyboards, I have virtual, computerized monkeys that output random gibberish. This is supposed to mimic a monkey randomly mashing the keys on a keyboard. The computer program I wrote compares that monkey’s gibberish to every work of Shakespeare to see if it actually matches a small portion of what Shakespeare wrote. If it does match, the portion of gibberish that matched Shakespeare is marked with green in the images below to show it was found by a monkey. The table below shows the exact number of characters and percentage the monkeys have found in Shakespeare. The parts of Shakespeare that have not been found are colored white. This process is repeated over and over until the monkeys have created every work of Shakespeare through random gibberish.
Technical Explanation
For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.
The monkeys’ data from Amazon’s cloud is updated on this site every 30 minutes. The images below show green for every character group that was found and white for those that are still missing. The images output is kind of like the animations for defrag utilities. As the monkeys progress through the works, more and more character groups will be found and show green.
The Tabular Output Of What Has Been Found
Every Work Of Shakespeare
All Works of Shakespeare
Progress Through Individual Works Of Shakespeare
A Lovers Complaint
Loves Labours Lost
The Merchant Of Venice
The Tragedy Of Julius Caesar
A Midsummer Nights Dream
Measure For Measure
The Merry Wives Of Windsor
The Tragedy Of King Lear
Much Ado About Nothing
The Tragedy Of Macbeth
Alls Well That Ends Well
The Sonnets
The Tragedy Of Othello Moor Of Venice
As You Like It
The Comedy Of Errors
The Taming Of The Shrew
The Tragedy Of Romeo And Juliet
Cymbeline
The Tempest
The Tragedy Of Titus Andronicus
King Henry The Eighth
The First Part Of King Henry The Fourth
Second Part Of King Henry IV
The First Part Of Henry The Sixth
The Second Part Of King Henry The Sixth
The Third Part Of King Henry The Sixth
The Two Gentlemen Of Verona
King John
The History Of Troilus And Cressida
The Tragedy Of Antony And Cleopatra
The Winters Tale
King Richard III
The Life Of King Henry The Fifth
The Tragedy Of Coriolanus
Twelfth Night Or What You Will
King Richard The Second
The Life Of Timon Of Athens
The Tragedy Of Hamlet Prince Of Denmark
Update: I was running this on a free micro instance (600 MB RAM) from Amazon. Alas, the monkeys needed more RAM than the free micro instance had and the processes get out of memory errors. I have moved the Hadoop server to my home computer which is much faster and has more memory.
Update 2: I updated the Hadoop configuration to have less idle CPU time. This will significantly increase the monkey power and find more character groups.
Update 4: I made a small change to how memory is allocated for the random character groups. It should help speed things up again.
Trackbacks/Pingbacks