Update 5: The monkeys recreated every work of Shakespeare and went viral. See the project project postmortem for my thoughts on going viral and what I learned during the project.

Update 6: I created a new visualization of the monkeys’ data.

Update 4: The monkeys recreated “A Lover’s Complaint”. Check out the write up.

Update 3: Welcome Slashdot, Fox News, Engadget and Japanese Engadget. So far, the monkeys have ran through 7.5 trillion 6.5 trillion 5 trillion (2011-09-22) 4 trillion (2011-09-16) 3 trillion (2011-09-10) 2.5 trillion (2011-09-07) 2 trillion (2011-09-05) 1.5 trillion (2011-09-01) 1 trillion (2011-08-28) 515,912,000,000 (2011-08-25) character groups.

In a recent post, I described a recent project to recreate Shakespeare using Hadoop and Amazon EC2.  This time, I am going to recreate every work of Shakespeare randomly.

This project comes from one of my favorite Simpsons episodes which has a scene where Mr. Burns brings Homer to his mansion (YouTube Video). One of his rooms has a thousand monkeys at a thousand typewriters. One of the monkeys writes a slightly incorrect line from Charles Dickens ‘It was the best of times, it was blurst of times.’  The joke is a play on the theory that a million monkeys sitting at a million typewriters will eventually produce Shakespeare.  And that is what I did (am doing).  I created millions of monkeys on Amazon and put them at virtual typewriters (aka Infinite Monkey Theorem).

Less Technical Explanation

Instead of having real monkeys typing on keyboards, I have virtual, computerized monkeys that output random gibberish. This is supposed to mimic a monkey randomly mashing the keys on a keyboard. The computer program I wrote compares that monkey’s gibberish to every work of Shakespeare to see if it actually matches a small portion of what Shakespeare wrote. If it does match, the portion of gibberish that matched Shakespeare is marked with green in the images below to show it was found by a monkey. The table below shows the exact number of characters and percentage the monkeys have found in Shakespeare. The parts of Shakespeare that have not been found are colored white. This process is repeated over and over until the monkeys have created every work of Shakespeare through random gibberish.

Technical Explanation

For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux.  Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys.  The Map Monkeys create random data in ASCII between a and z.  It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys.  Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test.  If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison.  If that passes, a genius monkey has written 9 characters of Shakespeare.  The source material is all of Shakespeare’s works as taken from Project Gutenberg.

The monkeys’ data from Amazon’s cloud is updated on this site every 30 minutes.  The images below show green for every character group that was found and white for those that are still missing.  The images output is kind of like the animations for defrag utilities.  As the monkeys progress through the works, more and more character groups will be found and show green.

The Tabular Output Of What Has Been Found

Loading Results… (Will only work on jesse-anderson.com due to browser security restrictions, go here)

Every Work Of Shakespeare

All Works of Shakespeare
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

All Works of Shakespeare

Progress Through Individual Works Of Shakespeare

A Lovers Complaint
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

A Lovers Complaint

Loves Labours Lost
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Loves Labours Lost

The Merchant Of Venice
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Merchant Of Venice

The Tragedy Of Julius Caesar
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Julius Caesar

A Midsummer Nights Dream
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

A Midsummer Nights Dream

Measure For Measure
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Measure For Measure

The Merry Wives Of Windsor
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Merry Wives Of Windsor

The Tragedy Of King Lear
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of King Lear

Much Ado About Nothing
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Much Ado About Nothing

The Tragedy Of Macbeth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Macbeth

Alls Well That Ends Well
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Alls Well That Ends Well

The Sonnets
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Sonnets

The Tragedy Of Othello Moor Of Venice
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Othello Moor Of Venice

As You Like It
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

As You Like It

The Comedy Of Errors
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Comedy Of Errors

The Taming Of The Shrew
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Taming Of The Shrew

The Tragedy Of Romeo And Juliet
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Romeo And Juliet

Cymbeline
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Cymbeline

The Tempest
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tempest

The Tragedy Of Titus Andronicus
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Titus Andronicus

King Henry The Eighth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

King Henry The Eighth

The First Part Of King Henry The Fourth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The First Part Of King Henry The Fourth

Second Part Of King Henry IV
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Second Part Of King Henry IV

The First Part Of Henry The Sixth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The First Part Of Henry The Sixth

The Second Part Of King Henry The Sixth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Second Part Of King Henry The Sixth

The Third Part Of King Henry The Sixth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Third Part Of King Henry The Sixth

The Two Gentlemen Of Verona
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Two Gentlemen Of Verona

King John
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

King John

The History Of Troilus And Cressida
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The History Of Troilus And Cressida

The Tragedy Of Antony And Cleopatra
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Antony And Cleopatra

The Winters Tale
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Winters Tale

King Richard III
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

King Richard III

The Life Of King Henry The Fifth
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Life Of King Henry The Fifth

The Tragedy Of Coriolanus
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Coriolanus

Twelfth Night Or What You Will
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

Twelfth Night Or What You Will

King Richard The Second
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

King Richard The Second

The Life Of Timon Of Athens
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Life Of Timon Of Athens

The Tragedy Of Hamlet Prince Of Denmark
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • Google+
  • reddit
  • Hacker News
  • Delicious

The Tragedy Of Hamlet Prince Of Denmark

Update: I was running this on a free micro instance (600 MB RAM) from Amazon. Alas, the monkeys needed more RAM than the free micro instance had and the processes get out of memory errors. I have moved the Hadoop server to my home computer which is much faster and has more memory.

Update 2: I updated the Hadoop configuration to have less idle CPU time. This will significantly increase the monkey power and find more character groups.

Update 4: I made a small change to how memory is allocated for the random character groups. It should help speed things up again.

Share This
Me at Strata (Tall)

Are you tired of materials that don't go beyond the basics of data engineering?

Take the next step and sign up for my newsletter. I share exclusive material that you won’t find anywhere else.

Material like:

• Unit testing: What are unit tests so important when working with Big Data?

• Avoiding mistakes: How to avoid common mistakes people make when learning Big Data technologies

• Best practices: What the the best practices professional data engineers use?

You have Successfully Subscribed!