What It Looks Like From the Outside

Blog Summary: (AI Summaries by Summarizes)
  • Big Data projects often fail due to incorrect assumptions made by management and engineering teams at the beginning of the project.
  • Management may think that Hadoop/Spark/Big Data is a silver bullet or an easy rollout, which leads to problems later on.
  • Teams often assume they will have time to go back and do things right, but they never get that time.
  • Changing data on disk without using a schema that can evolve can cause trouble changing code and push out development timelines.
  • These types of projects and mentalities are often cancelled due to a lack of progress, and the post-mortem usually blames the technology.

I teach and mentor teams that have started or are several months into their projects. I see what happens after they’ve experienced problems. I view the teams from the outside looking in. I see the manifestations of problems and I have to figure out what the root of each problem is.

These issues often come from management thinking Hadoop/Spark/Big Data is a silver bullet or that it’s going to be an easy rollout. Once they get deep into the guts or project, management and engineering find out it isn’t easy. They’re faced with the difficult decision of delaying the project or doing a half-assed job.

These incorrect assumptions made in a vacuum at the beginning of a project lead to failure. If you’re embarking on a Big Data project, make sure you’ve read and applied my Data Engineering Teams book’s advice.

The team assumes that somehow they’ll have the time to go back and do it right. They don’t ever get the time to go back and do it again. There are two main reasons. First, teams are never given the time to go back and do it right. Second, it means changing data in flight or on disk.

If you’re changing data on disk and didn’t use a schema that can evolve, you’ll have all sorts of trouble changing code. This becomes the non-starter or pushes out development timelines. For enterprises, they’ll have to convince and coordinate with other teams on code changes.

These are the types of projects and mentalities that are cancelled due to a lack of progress. Usually the post-mortem blames the technology. To the outside observer that’s reason why things failed; there was some kind of technical issue. It takes an honest look at the whole project to truly figure what caused the problems in the first place.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased