A Few Million Monkeys Randomly Recreate Shakespeare

Blog Summary: (AI Summaries by Summarizes)
  • A group of virtual monkeys has successfully recreated every work of Shakespeare through random gibberish.
  • The project started on August 21, 2011, and over 6.5 trillion character groups have been randomly generated and checked out of the 5.5 trillion possible combinations.
  • The monkeys will continue typing until every work of Shakespeare is randomly created.
  • The project has gone viral and appeared on various media outlets, including Slashdot, Fox News, and Engadget.
  • The project was inspired by a scene from The Simpsons, where Mr. Burns brings Homer to his mansion with a room full of monkeys at typewriters.

Friends, Romans, countrymen, lend me your ears;
I come to recreate Shakespeare, not to praise him.
Monkey Julius Caesar

Update 1: The monkeys recreated every work of Shakespeare and went viral. See the project project postmortem for my thoughts on going viral and what I learned during the project.

Update 2: I created a new visualization of the monkeys’ data.

Today (2011-09-23) at 2:30 PST the monkeys successfully randomly recreated A Lover’s Complaint, The Tempest (2011-09-26), As You Like It (2011-09-28), Loves Labours Lost (2011-09-29), Much Ado About Nothing (2011-09-29), The Merchant Of Venice (2011-09-29), The Sonnets (2011-09-29), The Third Part Of King Henry The Sixth (2011-09-29), The Two Gentlemen Of Verona (2011-09-29), A Midsummer Nights Dream (2011-09-30), As You Like It (2011-09-30), The Life Of King Henry The Fifth (2011-09-30), The First Part Of Henry The Sixth (2011-09-30), The Tragedy Of Titus Andronicus (2011-09-30), The Winters Tale (2011-09-30), Measure for Measure (2011-10-01), The First Part Of King Henry The Fourth (2011-10-01), and The History Of Troilus (2011-10-01), Cressida (2011-10-01), Cymbeline (2011-10-02), King Richard The Second (2011-10-02), The Comedy Of Errors (2011-10-02), The Life Of Timon Of Athens (2011-10-02), The Tragedy Of Macbeth (2011-10-02), The Tragedy Of Othello Moor Of Venice (2011-10-02), Twelfth Night Or What You Will (2011-10-02), Alls Well That Ends Well (2011-10-03), King Henry The Eighth (2011-10-03), The Second Part Of King Henry The Sixth (2011-10-03), The Tragedy Of Hamlet Prince Of Denmark (2011-10-03), The Tragedy Of Julius Caesar (2011-10-03), The Tragedy Of Romeo And Juliet (2011-10-03), King John (2011-10-04), King Richard III (2011-10-04), Second Part Of King Henry IV (2011-10-04), The Tragedy Of Antony And Cleopatra (2011-10-04), The Tragedy Of Coriolanus (2011-10-04), The Tragedy Of King Lear (2011-10-04), and The Taming Of The Shrew (2011-10-06). This is the first time a work of Shakespeare has actually been randomly reproduced. Furthermore, this is the largest work ever randomly reproduced. It is one small step for a monkey, one giant leap for virtual primates everywhere.

The monkeys will continue typing away until every work of Shakespeare is randomly created. Until then, you can continue to view the monkeys’ progress on that page. I am making the raw data available to anyone who wants it. Please use the Contact page to ask for the URL. If you have a Hadoop cluster that I could run the monkeys project on, please contact me as well.

This project originally started on August 21, 2011. Over the course of the project, over 6.5 trillion character groups have been randomly generated and checked out of the 5.5 trillion possible combinations.

So far, the project has appeared on Slashdot, Fox News, Engadget, Japanese Engadget, and Solidot. The radio interviews are Australian Broadcasting Company, Little Tommy, Jeff and Jer in San Diego and Radio New Zealand. If you would like to do a story, please contact me via the Contact page.

The Inspiration

This project comes from one of my favorite Simpsons episodes which has a scene where Mr. Burns brings Homer to his mansion (YouTube Video). One of his rooms has a thousand monkeys at a thousand typewriters. One of the monkeys writes a slightly incorrect line from Charles Dickens “It was the best of times, it was blurst of times.” The joke is a play on the theory that a million monkeys sitting at a million typewriters will eventually produce Shakespeare. And that is what I did. I created millions of monkeys on Amazon EC2 (then my home computer) and put them at virtual typewriters (aka Infinite Monkey Theorem).

Less Technical Explanation

Instead of having real monkeys typing on keyboards, I have virtual, computerized monkeys that output random gibberish. This is supposed to mimic a monkey randomly mashing the keys on a keyboard. The computer program I wrote compares that monkey’s gibberish to every work of Shakespeare to see if it actually matches a small portion of what Shakespeare wrote. If it does match, the portion of gibberish that matched Shakespeare is marked with green in the images below to show it was found by a monkey. The table below shows the exact number of characters and percentage the monkeys have found in Shakespeare. The parts of Shakespeare that have not been found are colored white. This process is repeated over and over until the monkeys have created every work of Shakespeare through random gibberish.

Technical Explanation

For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.

The monkeys’ data from Amazon’s cloud is updated on this site every 30 minutes. The images below show green for every character group that was found and white for those that are still missing. The images output is kind of like the animations for defrag utilities. As the monkeys progress through the works, more and more character groups will be found and show green.

This chart shows the total number of character groups as more and more iterations of the checks are run.

This chart shows percent complete as more and more iterations are run for each story.

For the curious, the computer I ran the monkeys on is a Core 2 Duo 2.66GHZ with 4 GB RAM running Ubuntu 10.10 64-bit.

A Few Words To Try and Prevent The Usual Comments

I realize there are different interpretations to this saying/theorem and I have done 2 different ones already. I understand the definition of infinite and infinite monkey theorem and I realize that this project does not have infinite resources. This project was funded and written by myself and was not supported by any grant money or federal money. No monkeys were harmed during the making of this code. This project is my attempt to find a creative way to attain an answer without infinite resources. It is a fun side project. If you still feel angry or slighted or feel the need to set me straight, please read this sign:


Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)There are several types of learning videos: hype, low effort, novice, and professional.It is important to avoid hype, low-effort, and

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes)The concept of a “data discovery team” is introduced, which focuses on searching for data in an enterprise data reality.Data