Hadoop Book Reviews

Blog Summary: (AI Summaries by Summarizes)
  • Two Hadoop books were reviewed: "Hadoop the Definitive Guide 2nd Edition" and "Hadoop In Action" by Tom White and Chuck Lam respectively.
  • "Hadoop the Definitive Guide 2nd Edition" is more focused on programming and has more real-world and applicable code examples.
  • The book goes into better detail about the programming side of things like debugging and logging.
  • "Hadoop In Action" is more aimed at people wanting to learn about Hadoop and gives a better overview of maintaining and setting up a Hadoop cluster.
  • The book also contains more overview chapters of the Hadoop associated projects like Pig and HBase.

Update: Review of Hadoop the Definitive Guide 3nd Edition

I spent some time reading 2 Hadoop books: Hadoop the Definitive Guide 2nd Edition by Tom White and Hadoop In Action by Chuck Lam.  Both books were well written but seemed to be aimed at a different audience.

Hadoop the Definitive Guide 2nd Edition seems to be aimed more at the programmer. There are lots of code samples and the author goes through the code line by line and does a great job of explaining why each one is important. I liked this book’s code examples better than Hadoop In Action because the book’s examples seemed to more real world and applicable. He goes into better detail about the programming side of things like debugging and logging. If you know enough about MapReduce to be dangerous, but want to know about Hadoop’s implementation of it, head to chapter 6 “How MapReduce Works”. I am a visual person and enjoyed this book’s diagrams for understanding the flow. Hadoop In Action doesn’t have any diagrams. This book contains more overview chapters of the Hadoop associated projects like Pig and HBase.

Hadoop In Action seems to be aimed more at people wanting to learn about Hadoop. It isn’t a cursory look at Hadoop, but this would be the book I would recommend to a manager or non-programmer to learn about Hadoop. For managers, I would send them straight to chapter 7 “Cookbook” which shows how other companies have used this technology. It also gives a better overview of maintaining and setting up a Hadoop cluster.

If you starting from scratch on Hadoop, I recommend you start out with Hadoop In Action. If you are going straight to coding or already have a handle on MapReduce, then I recommend you buy Hadoop the Definitive Guide 2nd Edition.

Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)There are several types of learning videos: hype, low effort, novice, and professional.It is important to avoid hype, low-effort, and

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes)The concept of a “data discovery team” is introduced, which focuses on searching for data in an enterprise data reality.Data

Black and white photo of three corporate people discussing with a view of the city's buildings

Current 2023 Announcements

Blog Summary: (AI Summaries by Summarizes)Confluent’s Current Conference featured several announcements that are important for both technologists and investors.Confluent has two existing moats (replication and

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on