What Do I Look for in Data Engineers?

Blog Summary: (AI Summaries by Summarizes)
  • A strong programming background is crucial for Data Engineers, with many having a Master's degree or above in Computer Science with a focus on distributed systems or data.
  • The best Data Engineers are not content with just programming and have started to cross-train into other fields, such as data science or marketing.
  • Data Engineers are driven to create bigger and more complex systems to create data products that can be used by everyone.
  • Great Data Engineers have a love or at least an interest in data and are inherently curious about what is happening and why.
  • Understanding systems and distributed systems is more important than knowing specific technologies, although knowledge of Big Data technologies and APIs is necessary.

I want to share with you some of the traits that I’ve found in especially good Data Engineers. Every one of these traits may not be in every Data Engineer, but you will find several.

I can’t stress enough how important it is for a Data Engineer to have a strong programming background. Data Engineers are commonly more mid to senior in their careers. Those fresh out of school usually have a Master’s degree or above in Computer Science with focus on distributed systems or data. I have seen some especially bright junior engineers make great contributions to the team.

This will sound odd given how much I talked about the importance of programming, but the best Data Engineers are bored with just programming. That means that they’ve mastered or nearly mastered programming as a discipline. Writing another enterprise system or small data project doesn’t have much interest.

As a result, they’ve started to cross-train into other fields. These could be related to programming like data science or unrelated like marketing or analysis.

Data Engineers are bored of creating small data systems. They aren’t as complex. They want to create bigger and more complex systems. The main driver for this is their desire to create data products that can be used by everyone.

This desire to create data products comes out of a common love of data. You might have seen a Software Engineer love coding or maybe even love a language. They are happiest when coding. Data Engineers love coding and data. If there isn’t a love, there is at least an interest in data. I’ve found this distinguishes the great Data Engineers from the good Data Engineers.

They use this data because they are inherently curious about what is happening and why. They’re going to use their data to either prove or disprove that hypothesis.

I don’t focus on what technologies a Data Engineer knows. I focus on their understanding of systems and distributed systems. They obviously need to know some Big Data technologies and APIs. However, learning APIs or another technology is much easier once you know the basic architectural and design patterns of Big Data systems. A Data Engineer who has shown they can learn some Big Data technologies is likely to have the ability to learn other technologies.

I see this all the time when I train a team that is already working with Big Data technologies. They catch on quicker to the concepts because there are similarities to their other Big Data technologies. The team learns more from the training because they’re not starting from scratch.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased