When You Have the Wrong Team for Big Data

Blog Summary: (AI Summaries by Summarizes)
  • The right skills and people are crucial for the success of a Big Data project.
  • Two teams were made up of the wrong skills and people, resulting in unsuccessful projects.
  • The first team was a data warehousing team that was trained on Python programming and Big Data technologies. They lacked programming and systems design knowledge, resulting in a grossly inefficient system that could barely accomplish business goals. The team was unable to handle the 10x increase in complexity when learning Big Data technologies, and the project was not successful.
  • The second team was a data warehousing team at a marketing company that spearheaded a Big Data project. The project was brittle, broke all the time, and was held together with duct tape in the form of bash scripts and Hive queries. The team lacked programming skills and the project was only able to do a small percentage of what the business needed. The company plans to re-architect the solution with the same team, but should expect poor results.
  • A team that lacks the right skills and people will fail outright or limp along.

In my book, Data Engineering Teams, I talk about the right skills and people to be on a data engineering team. The right skills and people are incredibly important to the success, or failure, of a Big Data project.

Sometimes it’s easier to understand this point with some real examples. Instead of telling you what the team should look like, I’m going to share the stories of two teams who were made up of the wrong skills and people. More importantly, I’m going to share the outcomes of their projects.

Data Warehousing Team Takes on Big Data

I taught at a large insurance company that was experiencing Big Data problems. They tried to solve these problems by using their existing data warehousing team. The idea was that they would train their SQL-focused team on Python programming and Big Data technologies.

The team had been told to memorize the Python API before I came. This showed the team’s deep misunderstanding of what programming is and what’s difficult about it. There was really only one student who spent any time learning to program before the class and she had the highest odds of getting anything accomplished.

The team told me about the systems they had created. They suffered from a fundamental lack of systems design and programming knowledge. As a direct result, the system was grossly inefficient and could barely accomplish the business goals. The team could only accomplish the business goals because the business acquiesced on every requirement. The system did about 10% of what the business needed.

Once the team started learning the Big Data side, the the 10x increase in complexity just made their eyes glaze over. A student asked about how a specific use case could use Big Data. I answered the question and told them how a correctly designed system could blow away their current system. During the break, an employee who wasn’t in the class asked a student who was in the class about the use case. The student said it wasn’t possible.

I circled back with the team a few months later. The project and Big Data rollout wasn’t successful.

Limping Along with Big Data

I tried to work with a medium-sized marketing company who told me about the current state of their Big Data project. The project kind of worked, but could only do a small percentage of what the business needed. The project was spearheaded by the company’s data warehousing team.

The project was in production and was super brittle. It broke all the time and much of their operations time was spent fixing the issues. The processing had to be done every hour and everything was manually kicked off.

The project was held together with duct tape in the form of bash scripts and Hive queries. This meant that any updates or fixes were incredibly difficult and untestable.

The company has plans re-architect the solution with the same team. They should expect equally poor results. I decided not to engage with the company.

Why Is This Happening?

These are just two of many stories. They all have the common thread that a team that doesn’t have the right skills and people fails outright or limps along. I’ve been wanting to share this data for a while, but I wanted to get more data points to validate what I’ve been seeing.

When everything is a hammer that looks like SQL, you get some real abominations. One of the biggest differences between data warehousing and Big Data is the programming and systems design involved. If a team lacks programming skills, they will try to solve the problem with the (wrong) tools that they know.

There is a 10x increase in complexity when going to Big Data. If a data warehousing team is barely able to keep up with the complexity of a small data system, they won’t be able to handle the increase in complexity.

As direct result of the lack of programming, systems design, and complexity, the team gets into a vicious cycle of low performance. The system is operationally fragile and breaks because it wasn’t well-designed. Because the system breaks all the time and the team lacks the programming skills, the team spends all of its time plugging holes and they’re unable to improve the system. Often, the team doesn’t know or understand how to improve the system.

If this sounds like scenario that you’re currently experiencing or want to avoid, I mentor teams to fix failing projects.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased