Why Data Science Teams Don’t Think They Need Data Engineering

Blog Summary: (AI Summaries by Summarizes)
  • Data science teams may mistakenly believe they don't need data engineering, leading to underperformance and technical debt.
  • Lack of understanding about the role of data engineers can result in unrealistic expectations and underestimation of complexity.
  • Creating repeatable data science processes is crucial for efficiency and maintenance of data products.
  • Ad hoc projects may not require extensive engineering but can limit long-term value and sustainability.
  • Transitioning to long-term projects highlights the importance of data engineering for maintaining scalable and maintainable data products.

Some of the most interesting consultations are when I help data science teams that don’t think they need data engineering. I’ve compiled a list of some of the more common reasons why data science teams believe they don’t need data engineering and why those reasons might not be valid.

Data science teams must have data engineering because the data scientists might just be getting by or severely underperforming. The results from missing the data engineering team are not great and leave much to be desired. Commonly, data scientists will create technical debt that data engineers will have to spend time fixing.

Data science teams must have data engineering because the data scientists might just be getting by or severely underperforming.

Lack of Understanding
Lack of Understanding

For some data scientists, there is a total lack of understanding of what data engineers do. This lack of knowledge comes from a cursory knowledge of programming and maybe some distributed system. It leads to a “how hard could it be?” question that downplays the complexities that data engineers hide from data scientists.

To help data scientists understand the various between a data engineer and a data scientist, I created some visualizations that clearly show the differences.

Repeatable Data Science

Some data science is repeatable. By that, I mean automation and consistent data products are being created and maintained. Some data science is ad hoc and not repeatable. In these scenarios, every project is started from scratch and, once the project is done, is completely discarded.

For ad hoc projects, there’s no big engineering onus. The projects only live for hours, days, or weeks. There’s no real need for any long-lived planning. I’d argue that organizations lose much of the value of data science when everything is so ephemeral.

When ad hoc organizations transition to long-term projects, they hit the brunt of their engineering mistakes. They’ve been able to escape the data engineering rigors of projects that need to be repeatable and run consistently. They find out the hard way that data engineering isn’t over-engineering; it’s making sure that the data products are maintainable. Creating repeatable data products requires data engineers.

There’s No Scale…Yet

Sometimes organizations start out with small or medium data and don’t have to deal with scale issues (count yourselves lucky). They’ve been able to get by with Excel, single processes, or waiting longer for results. The transition to big data and scale catches them by surprise.

The transition to big data technologies comes with a significant increase in complexity due to the distributed systems. At first, the data scientists think they can handle the growth. It should become quickly apparent that they can’t deal with the complexity increase and need data engineers.

Creating scalable data products requires data engineers.

What It Looks Like and What to Do

If your team is experiencing one of these problems, it will look like the data science team is stuck. They’ll spend a week on something that seems like it should take hours or a day. They’ll spend hours googling or searching on StackOverflow for answers (these sorts of solutions aren’t findable on Google or StackOverflow). The data scientists simply won’t be technically competent enough to realize the issue. These sorts of problems fall right into the wheelhouse of data engineering.

Managers and data scientists will need to take an honest look at the team’s productivity and skills. They more than likely will need data engineers and need to establish a data engineering team. I cover how to start and resource a data engineering team in my Data Teams book.

Frequently Asked Questions (AI FAQ by Summarizes)

Why do data science teams sometimes mistakenly believe they don't need data engineering?

Data science teams may mistakenly believe they don't need data engineering, leading to underperformance and technical debt.

What can happen if there is a lack of understanding about the role of data engineers?

Lack of understanding about the role of data engineers can result in unrealistic expectations and underestimation of complexity.

Why is creating repeatable data science processes crucial?

Creating repeatable data science processes is crucial for efficiency and maintenance of data products.

Why may ad hoc projects not require extensive engineering?

Ad hoc projects may not require extensive engineering but can limit long-term value and sustainability.

Why is data engineering important for transitioning to long-term projects?

Transitioning to long-term projects highlights the importance of data engineering for maintaining scalable and maintainable data products.

What challenges may organizations face when scaling up from small or medium data?

Organizations starting with small or medium data may face challenges when scaling up, requiring a shift to big data technologies.

Why does the complexity of distributed systems in big data technologies often require data engineers?

The complexity of distributed systems in big data technologies often necessitates the involvement of data engineers.

What technical challenges may data science teams struggle with that fall within the expertise of data engineers?

Data science teams may struggle with technical challenges that fall within the expertise of data engineers.

How can evaluating team productivity and skills reveal the need for a dedicated data engineering team?

Evaluating team productivity and skills may reveal the need for a dedicated data engineering team to enhance overall performance.

How can establishing a data engineering team improve productivity and address technical challenges effectively?

Establishing a data engineering team can improve productivity and address technical challenges effectively.

Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)Learning options trading involves data and programming but is not as technical as data engineering or software engineering.Different types of