We’re going to see a sharp uptick in Big Data and Hadoop companies. This is all starting with Hortonworks’ recent IPO. The coverage of Hortonworks’ IPO always mentions my former company, Cloudera, as being notably absent in the IPO chase.
I want to tell you about my experience at Cloudera. I was employee ~200 and I worked on the training team. My experiences were mostly positive. It’s the first company that I’ve ever left and would return to.
By far Cloudera’s biggest asset are their people. I characterize Clouderans, as Cloudera employees are called, as incredibly smart, humble and dedicated. Your interactions are with the people who wrote the books on Hadoop or were the founder/committer on the project. This level of people should have brought about massive egos, but they didn’t. I dare most people to pick out the technical lead out of the group; you can’t do it based on sheer ego and attitude.
There was a funny thing that happened with new Cloudera employees. They were used to companies that had a wider strata of competence (usually on the low side) of employees. They would begin their descriptions or conversations assuming the very low end of knowledge. You quickly learn that you should assume a high level of technical competence when dealing with any Clouderan. Otherwise, you’d be interrupted to take it higher. That manifests itself in that a former Clouderan is a hot commodity these days and some even take to saying it on their LinkedIn profile headlines.
While at Cloudera, I learned how a great business should be run. I had worked at other companies and seen how a poorly managed and undirected company is run. Cloudera, by and large, communicates well internally. There are twice monthly meetings for the entire company. These meetings cover any new projects or company news. This isn’t just the CEO talking. It’s often the engineer, manager or program manager telling the company about their project. You do feel that you know the company’s goals and team’s goals.
That isn’t to say Cloudera’s phenomenal growth isn’t without its problems. I’ve heard the comparison that Cloudera is like a teenager that’s hit a growth spurt. There will be those times when they’ll smack themselves in the face with their gangly arms. It just happens sometimes.
If you’re looking at joining Cloudera, I say great and prepare well. If you’re an investor reading up on Big Data or Hortonworks’ competitors, the future looks very bright for the technology, success stories and uses cases.
Full disclosure: I am a Cloudera shareholder.Read More
I created a 7 minute video showing how HBase works with playing cards.Read More Read More
My article on CEO.com was posted today. I talk about ways to hire and interview your first software engineer. I reference the importance of giving back to software groups as you use their help. Here is the guest post I wrote for startup communities discussing how to give back.Read More
I am proud to announce my latest series of screencasts on Hadoop MapReduce. It’s published again by the good people at Pragmatic Programmers. These screencasts are the best way for a beginner to learn about Hadoop, unless they’re sitting in my class at Cloudera University.
Here’s few links to get started after you’ve purchased the screencasts:
First, you want a way to run Hadoop, MapReduce and Eclipse. There is a virtual machine that is set up and running with everything you need. I have a mini-screencast showing how to use Eclipse and debug things.
Finally, you’ll need the dataset for the second episode. It uses the Nasdaq daily stock prices from InfoChimps.
The focus of the screencast isn’t administration and installation. This screencast is focused on the developer side of things. The source code is written to run on the Cloudera QuickStart VM out of the box.Read More
One of the common questions I get from students and developers relates to IDEs and MapReduce. How you create a MapReduce project in Eclipse and debug it? I have created a short screencast showing you how.
Cloudera QuickStart VM
The Cloudera QuickStart VM lets developers get started with writing MapReduce code without having to worry about software installs and configuration. Everything is installed and ready to go. You can download the image type that corresponds to your preferred virtualization platform.
Eclipse is installed on the VM and there is a link on the desktop to start it.
MapReduce and Eclipse
You can run and debug MapReduce code in Eclipse just like any other Java program. There are a few differences between running MapReduce in a distributed cluster and in an IDE like Eclipse. When you run MapReduce code in Eclipse, Hadoop runs in a special mode called LocalJobRunner. All of the Hadoop daemons are run in a single JVM (Java Virtual Machine) instead of several different JVMs. Another difference is that all file paths default to local file paths and not HDFS file paths.
With those caveats in mind, you can start putting in your breakpoints and debug your MapReduce code like any other Java program.
If you want to clone the same Git project as I do in the screencast, you can find it here. From the terminal type in:
git clone firstname.lastname@example.org:eljefe6a/UnoExample.git
The project will be cloned to the current directory as a subdirectory.
Note that creating Eclipse projects manually is the easy way to get started. If you are going to have Hadoop as part of an automated build process, you will want to do this in Maven. In Maven, you can create Eclipse projects. This blog post tells you how. If you want to compile Hadoop from source using Eclipse, this blog post shows how.
Whether you want to start writing some MapReduce code or debug existing code, the QuickStart VM will help you do it quickly and easily. This screencast walks you through it and gets you coding on your favorite IDE.Read More