Processing Big Data with MapReduce

Blog Summary: (AI Summaries by Summarizes)
  • Jesse Anderson has released a new series of screencasts on Hadoop MapReduce, published by Pragmatic Programmers.
  • The screencasts are a great way for beginners to learn about Hadoop.
  • A virtual machine is available with everything needed to run Hadoop, MapReduce, and Eclipse.
  • Source code for the screencasts is available on GitHub.
  • The dataset used in the second episode is the Nasdaq daily stock prices from InfoChimps.

I am proud to announce my latest series of screencasts on Hadoop MapReduce. It’s published again by the good people at Pragmatic Programmers. These screencasts are the best way for a beginner to learn about Hadoop, unless they’re sitting in my class at Cloudera University.

Here’s few links to get started after you’ve purchased the screencasts:

First, you want a way to run Hadoop, MapReduce and Eclipse. There is a virtual machine that is set up and running with everything you need. I have a mini-screencast showing how to use Eclipse and debug things.

Next, you’ll need the source code for the screencast. The first and third episodes’ source code is here and the second episode’s source code is here.

Finally, you’ll need the dataset for the second episode. It uses the Nasdaq daily stock prices from InfoChimps.

The focus of the screencast isn’t administration and installation. This screencast is focused on the developer side of things. The source code is written to run on the Cloudera QuickStart VM out of the box.

Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)There are several types of learning videos: hype, low effort, novice, and professional.It is important to avoid hype, low-effort, and

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes)The concept of a “data discovery team” is introduced, which focuses on searching for data in an enterprise data reality.Data

Black and white photo of three corporate people discussing with a view of the city's buildings

Current 2023 Announcements

Blog Summary: (AI Summaries by Summarizes)Confluent’s Current Conference featured several announcements that are important for both technologists and investors.Confluent has two existing moats (replication and

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on