What Happens When Data Science Teams Add A Data Engineer

Blog Summary: (AI Summaries by Summarizes)
  • Organizations are recognizing the importance of data engineering, but there is often a misunderstanding that simply adding a data engineer to the team will solve all problems.
  • Data science teams may not fully buy into the critical nature of data engineering, leading to challenges in collaboration and understanding the role of data engineers.
  • Hiring the wrong data engineers can lead to a cascade of problems within the team, including underperformance and technical debt.
  • Projects can suffer when the team lacks the necessary expertise in data engineering, leading to confusion and inefficiencies in decision-making.
  • The first data engineer in a team may face daunting challenges in understanding existing code and architecture, especially if technical debt has accumulated.

By Jesse Anderson and Mikio Braun

Organizations are gradually getting the message about the critical nature of data engineering. Data science teams are getting that message too. Sometimes, that message gets muddled, and data science teams think they just need to add a data engineer or two to their team. In their mind, this solves the problem, and we can go back to business as usual. We’d like to share our experiences when this happens and why this isn’t the right course of action.

Organizations are gradually getting the message about the critical nature of data engineering. Data science teams are getting that message too.

Buy-In

The core issue here is that data science teams don’t fully buy into the notion that data engineering is critical to success. Instead, there is a “there I did it” or “there I fixed it” sort of mentality. So naturally, the actual data science work is on their minds, and they often don’t have enough knowledge to fully understand the challenges in data engineering. In addition, the amount of time needed for data engineering compared to the data science side of things is often perceived as a problem, again without fully understanding why. In Data Teams, Jesse recommends a ratio of 2-5 data engineers per data scientist.

Hiring

Many problems trace their way back to hiring. Put simply, data scientists often hire the wrong engineers, and it just gets worse from there.

Many problems trace their way back to hiring. Put simply, data scientists often hire the wrong engineers, and it just gets worse from there.

Hiring the wrong people can have all sorts of root causes. For example, data scientists may not believe data engineering can help or is all that difficult. Or, they could completely misunderstand data engineering and have worked with the wrong kind of data engineer that only promulgates the flawed archetype of a data engineer. We’ve also the seen data science teams change the title of the most data engineering savvy data scientist to a data engineer. Usually, this puts the most competent data scientist on the job, but in comparison to data engineers, is the least qualified.

The poor hiring becomes a self-fulfilling prophecy. Not knowing how to evaluate a data engineering candidate, the data science team chooses the wrong person, leading to underperformance, and it is hard to learn how to do it right from that. This cycle repeats itself to create a strong bias.

Getting The Project Underway

The project gets underway. There are so many technologies to choose from. Too many things to be done and fixed. How should the data engineer start to make headway when they can’t even understand things?

Projects with the wrong people end up as questions on Reddit. They usually say something like, “I was just hired, and I don’t really know what to do. Here is what they’re asking for. Could you help me choose some technologies?” The responses are well-meaning but miss crucial information because the original post leaves them out. Some suggestions are flat-out wrong. It leaves the unqualified data engineer to try to implement something they couldn’t understand or vet in the first place (see the issues with using beginners in Chapter 10, “Starting a Team” of Data Teams). This failure leaves the business and value creation in the same place or worse than before.

Performing Surgery

Being the first data engineer to start working with data scientists’ code and architecture can be daunting. In addition, the data scientists could have created a mountain of technical debt.

Getting anywhere can be the most delicate surgery of fixing technical debt while not breaking the entire system. From a personnel standpoint, it takes a qualified data engineer even to attempt to fix it. It will be more likely to require a whole team of data engineers to make the fixes and rearchitecting necessary. As a result, you will find yourself worse off with the wrong person than before (see the self-fulfilling prophecy above). 

Outnumbered and Outgunned

When data engineers are outnumbered, they’re often outvoted and outgunned. As a result, the issues, tasks, and challenges significant to data engineers aren’t essential or understood by the data scientists on the team.

A data engineer’s issues are perceived by the data scientists as too expensive, slowing down the data science, or over-engineering. Without a more prominent voice on the team, the data engineers can be easily overlooked or shouted down. For example, data engineers will see the issues and poor design that led to the data scientists’ technical debt in the first place. The data scientists will veto the fixes or changes because they will slow them down or perceive them as unnecessary in the first place.

Some of the worst-case scenarios are that all of the data engineer’s ideas and changes are ignored while the data engineer is assigned to the more menial tasks the data scientists don’t want to do. It creates a poor match on both sides of the equation.

What Do The Problems Look Like?

If you’re a data engineer on one of these teams, you already know what it looks like. Nothing is changing; you’re frustrated and looking for a new position.

For management, this looks like you’ve added a data engineer with the thought that it would fix a problem, and there’s no change. Instead, all of the status quo was maintained. You’ve simply added a person without fixing the deeper organizational issues that got you there. We’ve helped many organizations in this situation, but there isn’t a one-size-fits-all fix. The initial steps start with management and organizational change and not the individual contributors. We’d love to sort out the problems and give you clarity on the next steps. You can contact us here to set a time to talk.

 

 

Frequently Asked Questions (AI FAQ by Summarizes)

What is a common misunderstanding about the importance of data engineering in organizations?

There is often a misunderstanding that simply adding a data engineer to the team will solve all problems.

What challenges can arise in collaboration between data science teams and data engineers?

Data science teams may not fully buy into the critical nature of data engineering, leading to challenges in collaboration and understanding the role of data engineers.

What are the potential consequences of hiring the wrong data engineers?

Hiring the wrong data engineers can lead to a cascade of problems within the team, including underperformance and technical debt.

How can projects suffer when lacking expertise in data engineering?

Projects can suffer when the team lacks the necessary expertise in data engineering, leading to confusion and inefficiencies in decision-making.

What challenges may the first data engineer in a team face?

The first data engineer in a team may face daunting challenges in understanding existing code and architecture, especially if technical debt has accumulated.

How can data engineers feel within a team?

Data engineers can feel outnumbered and outgunned, leading to their issues being overlooked or undervalued by data scientists.

What can poor communication and decision-making lead to for data engineers?

Poor communication and decision-making can result in data engineers being assigned menial tasks or having their ideas ignored, creating a mismatch in responsibilities.

What may data engineers do if organizational issues persist despite their addition to the team?

Data engineers may become frustrated and seek new opportunities if organizational issues persist despite their addition to the team.

What approach should management take to address data engineering challenges effectively?

Management must address deeper organizational issues rather than relying solely on adding individual contributors to the team to solve data engineering challenges.

What is crucial for resolving data engineering problems effectively?

Seeking clarity on the next steps and addressing organizational and management issues is crucial for resolving data engineering problems effectively.

Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)Learning options trading involves data and programming but is not as technical as data engineering or software engineering.Different types of

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes)Data discovery team plays a crucial role in searching for data in the IT landscape.Data discovery team must make data