A Few Million Monkeys Randomly Recreate Every Work Of Shakespeare
All the world’s a stage,
And all the monkeys merely players;
They have their typos and their hits,
And one monkey in his time plays many parts,
His acts being 38 works of Shakespeare.
- Monkey As You Like It
Update: I created a new visualization of the monkeys’ data.
The monkeys accomplished their goal of recreating all 38 works of Shakespeare. The last work, The Taming Of The Shrew, was completed at 2 AM PST on October 6, 2011. This is the first time every work of Shakespeare has actually been randomly reproduced. Furthermore, this is the largest work ever randomly reproduced. It is one small step for a monkey, one giant leap for virtual primates everywhere. This page shows what day each work of Shakespeare was completed on.
The Million Monkeys project went viral, but not in the cool, apocalyptic way. The Million Monkeys project went viral starting on September 25, 2011 and went into full swing on September 26, 2011. On September 26, 2011, over 25,000 unique visitors viewed the Million Monkeys project, 300 sites referred traffic, and people viewed it from 119 countries. This post will contain some of my thoughts and reactions on going viral. If this article about going viral goes viral, it will create an infinite loop that will bring about the destruction of the world.
NOTE: I apologize in advance for having to use the term “go viral” so much, but that really explains the phenomenon.
I am proud to announce that I have open sourced the Million Monkeys project. The source code is available here.
This project originally started on August 21, 2011. Over the course of the project, over 7.5 trillion character groups have been randomly generated and checked, out of the 5.5 trillion (5,429,503,678,976) possible combinations.
If you would like to do a story, please contact me via the Contact page.
Thoughts on Going Viral
As I mentioned before, the Million Monkeys project went viral on September 26, 2011. This was partly due to me spending a few hours E-mailing every news outlet I could think of. Another part was people using Twitter and Facebook to promote the project. On that day alone, over 2,300 visitors came to the site through Facebook and Twitter.
The first round of the project had no recognition, even among my friends. I thought the concept was cool and I kept with it. During a conversation with a friend of mine, we came up with a new concept for the project.
I went back to the drawing board for the second round of the project with the ideas from the new concept. I started using a smaller group size, 9 character groups instead of 24 character groups. This would allow the project to complete without infinite amount of resources. I added near real-time updates of the site so people could see the progress of the monkeys. I wanted people to be able to come back to the site to watch their favorite work being recreated. This round received some recognition and landed on the front pages of Fox News and Engadget.
I knew I was on the right track. I was getting some media attention and people were starting to see the site. My goal was to do another media blitz once the monkeys completed their first work. My goal was to get an Associated Press article and, if I was lucky enough, get on the front page of Slashdot. I thought I had a good idea, but I had no delusions of the project going viral.
On Sunday night September 25, 2011, I was reading through my RSS feeds on Google Reader. Some new Slashdot stories appeared and I dutifully started reading them. When I started reading about myself and my project, I started to think I had clicked on the wrong feed or I had erred in some fashion. I could not believe I was reading about myself on Slashdot after many years of reading it. My wife was next to me at the time and I tried to explain why I was so ecstatic to be on Slashdot. Explaining to a non-geek about Slashdot is difficult, but I think she could see it was important to me. If the media blitz had died at that point, I would have been happy. It didn’t. Over the course of the next day, the story kept on gaining momentum, getting more news stories, and more hits on the website.
All glory may be fleeting, but not everyone liked the project. I received my share of hate mail, hate comments, and hate blog posts. I was informed that I didn’t understand Infinite Monkey Theorem (I do), that I was conning people (I’m not, the source code and data are available), and that the project was boring (beauty is in the eye of the beholder). Before anyone decides to create a project on the Internet, you better have a thick skin to put up with peoples’ comments. I responded to the people I thought were genuinely asking a question or those that seemed to be open to a discussion about the project. Most people responded and most people were nice.
You should create as many social objects as possible. I have several YouTube videos where I explain in various levels of detail about the project. These YouTube videos, in turn, were posted by the various sites on their postings. The blog postings themselves were great social objects. I could see by the direct traffic that people were E-mailing the link about to their friends. My Twitter feed allowed me to converse with people who had questions about the project. They also allowed me to tweet the URLs of interviews, articles and radio shows about the project.
To gain the most amount of media attention, you make your project and/or post as media friendly as possible. Many of the sites wrote their articles only using the posts as source material. I put a lot of effort into making the site as straightforward as possible and as quotable as possible. When doing a technical project like this, not all of your readers will be technically minded people. I recommend creating sections for technical and non-technical people. The non-technical people may glaze over at a very technical explanation of your project and a technical person will want more technical detail.
The site itself needs to ready technically for a huge increase in traffic. Many sites go down during a Slashdotting. Fortunately for me, DreamHost kept my site going without stoppage. It’s usually too late to change your site once it goes viral. Make sure you have some metrics for your site to track the usage. In my case, I use Google Analytics for WordPress. Having a decent looking site also helps. If you are not a designer, use your good taste and find a good them for site. I used ElegantTheme’s Minimal theme for this site. To handle a Slashdotting, your site needs to be optimized. From the beginning of this project, I tried to optimize the site. The images showing the progress through Shakespeare were indexed PNGs. They provided the smallest file size and therefore the best scalability. Much to my lament, the comments are not working on this site. One of the CAPTCHA plugins I installed messed things up and it is still not working even after I uninstalled all of them.
Make sure your site makes it as easy as possible to connect with your users socially. The previous posts did not have the Facebook likes and Tweets when they were on Engadget and Fox News. I made it more difficult than it should have been for people to tell their friends about the project. From the start of this round, I have the “like” buttons for the major social players. The site’s traffic and the numbers of people “liking” shows much better the story made its rounds.
Was Was It A Success?
I always do a postmortem at the end of every project. This is the Million Monkeys project postmortem. I think the project was a resounding success. It achieved its primary goal of recreating every work of Shakespeare. People saw my work. While I might have received over 25,000 unique visitors to my site, millions and millions of people read about my work on mainstream news, blogs, print and radio. My personal branding (which is what this website is) went through the roof. On Google, the search term “jesse anderson” used to appear as the 45th link. Now, I have links 4-6. The top 3 spots belong to an anime character named Jesse Anderson (Andersen). The project also brought me recognition within my own company, Intuit.
This success was not the result of luck. I found it is not the result of luck or a random chance, but the result of countless hours of hard work. Even though the Million Monkeys project took 40-60 hours of my time to write, it took countless hours before that to become a better programmer and learn new technologies like Hadoop. A lot of time was spent submitting the story and working with reporters on stories.
In a way, the Million Monkeys is the current culmination of this time spent.
A lot of reporters asked me what I wanted to accomplish with this project. For me it is performance art with monkeys and computers. I wanted to make it engaging and have people coming back to check the monkeys’ progress, so I did near real-time updates of the site. People did just that as was reflected through the usage logs. People were coming back and they were E-mailing it around to their friends. They were tweeting it and liking it on Facebook. I consider that the most gratifying part of the project; people enjoyed it.
As time went on, I began to anthropomorphise the monkeys more and more. Instead of thinking of them as a PRNG (pseudo random number generator) and a computer program, I was talking about them as if they were really monkeys. I began to identify with them and think of them like a pet. Maybe I spent too much time curating their work.
Going back to thick skin, I have a list of people to contact to get approval of projects. If anyone wants this list before they start their project, please E-mail me so we can get their approval. It’s of utmost importance that any project contact them before starting any work.
Reading about yourself in the news is one of the craziest things that can happen to you. There is kind of a disembodied realization that it is you, but it does not feel like you did it. That first week seemed like it was a month long. I was doing a lot of interviews and every moment seemed like an eternity.
I could not get the local media in Reno to do any stories on the project. It was incredibly funny because I would E-mail them saying the project has been on BBC, CNN, etc and I never even got a reply. I will take international coverage over local coverage any day, but it was funny that local didn’t follow international. Update: I finally got some local press.
Some More Numbers
The monkeys ran 180,000,000,000 character groups a day. An average iteration lasted 30 minutes 33 seconds and ran 5,000,000,000 character groups. The monkeys found 1,982,507 distinct character groups and those character groups were found 3,788,175 times for a ratio of 1.8718555. The monkeys ran 7,445,912,000,000 total character groups out of the 5,429,503,678,976 possible combinations for a ratio of 1.3713.
There are 2 technologies I think set the Monkeys Project apart from previous endeavors. The first is Hadoop, which scales well and can handle exponential problems like Infinite Monkey Theorem. The second is a Bloom Filter. I ran a test last night comparing the Bloom Filter speed to a String.indexOf. The Bloom Filter took 25 seconds to run every work of Shakespeare and I stopped the String.indexOf after 2 hours. The monkeys project would not be close to the number of character sets it is now if not for the Bloom Filter. In other words, if not for the usage of a Bloom Filter, the project would be far from complete. I think this would even be true of using Lucene or Sphinx but not as bad.
This project comes from one of my favorite Simpsons episodes which has a scene where Mr. Burns brings Homer to his mansion (YouTube Video). One of his rooms has a thousand monkeys at a thousand typewriters. One of the monkeys writes a slightly incorrect line from Charles Dickens “It was the best of times, it was blurst of times.” The joke is a play on the theory that a million monkeys sitting at a million typewriters will eventually produce Shakespeare. And that is what I did. I created millions of monkeys on Amazon EC2 (then my home computer) and put them at virtual typewriters (aka Infinite Monkey Theorem).
Less Technical Explanation
Instead of having real monkeys typing on keyboards, I have virtual, computerized monkeys that output random gibberish. This is supposed to mimic a monkey randomly mashing the keys on a keyboard. The computer program I wrote compares that monkey’s gibberish to every work of Shakespeare to see if it actually matches a small portion of what Shakespeare wrote. If it does match, the portion of gibberish that matched Shakespeare is marked with green in the images below to show it was found by a monkey. The table below shows the exact number of characters and percentage the monkeys have found in Shakespeare. The parts of Shakespeare that have not been found are colored white. This process is repeated over and over until the monkeys have created every work of Shakespeare through random gibberish.
For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.
For the curious, the computer I ran the monkeys on is a Core 2 Duo 2.66GHZ with 4 GB RAM running Ubuntu 10.10 64-bit.
A Few Words To Try and Prevent The Usual Comments
I realize there are different interpretations to this saying/theorem and I have done 2 different ones already. I understand the definition of infinite and infinite monkey theorem and I realize that this project does not have infinite resources. This project was funded and written by myself and was not supported by any grant money or federal money. No monkeys were harmed during the making of this code. This project is my attempt to find a creative way to attain an answer without infinite resources. It is a fun side project. If you still feel angry or slighted or feel the need to set me straight, please read this sign:
Thanks to my wife Sara, daughter Ashley, David Weinberg, Ryan Polk, and Tim Dailey.