What It Looks Like From the Outside

Blog Summary: (AI Summaries by Summarizes)
  • Big Data projects often fail due to incorrect assumptions made by management and engineering teams at the beginning of the project.
  • Management may think that Hadoop/Spark/Big Data is a silver bullet or an easy rollout, which leads to problems later on.
  • Teams often assume they will have time to go back and do things right, but they never get that time.
  • Changing data on disk without using a schema that can evolve can cause trouble changing code and push out development timelines.
  • These types of projects and mentalities are often cancelled due to a lack of progress, and the post-mortem usually blames the technology.

I teach and mentor teams that have started or are several months into their projects. I see what happens after they’ve experienced problems. I view the teams from the outside looking in. I see the manifestations of problems and I have to figure out what the root of each problem is.

These issues often come from management thinking Hadoop/Spark/Big Data is a silver bullet or that it’s going to be an easy rollout. Once they get deep into the guts or project, management and engineering find out it isn’t easy. They’re faced with the difficult decision of delaying the project or doing a half-assed job.

These incorrect assumptions made in a vacuum at the beginning of a project lead to failure. If you’re embarking on a Big Data project, make sure you’ve read and applied my Data Engineering Teams book’s advice.

The team assumes that somehow they’ll have the time to go back and do it right. They don’t ever get the time to go back and do it again. There are two main reasons. First, teams are never given the time to go back and do it right. Second, it means changing data in flight or on disk.

If you’re changing data on disk and didn’t use a schema that can evolve, you’ll have all sorts of trouble changing code. This becomes the non-starter or pushes out development timelines. For enterprises, they’ll have to convince and coordinate with other teams on code changes.

These are the types of projects and mentalities that are cancelled due to a lack of progress. Usually the post-mortem blames the technology. To the outside observer that’s reason why things failed; there was some kind of technical issue. It takes an honest look at the whole project to truly figure what caused the problems in the first place.

Related Posts

Data Teams Survey 2024 Results

Blog Summary: (AI Summaries by Summarizes)Companies are not fully utilizing LLMs in data engineering, with 24.7% of teams not using them at all.Only 12% of