Data Engineering Technology Tree

Data Engineering Technology Tree

Blog Summary: (AI Summaries by Summarizes)
  • Data engineering requires in-depth knowledge of big data and many different technologies.
  • The skills required for data engineering can be imagined as a technology tree, similar to the Civilization series.
  • The technology tree for data engineering starts with a specialization in technology and branches out into systems, programming, and architecture.
  • Different backgrounds, such as DBA/Data Warehouse/SQL-Focused, Software Engineer, and Data Scientist, will have different areas of the technology tree covered.
  • Companies and individuals that try to skip ahead on the technology tree without filling it in will have problems, such as misunderstandings in solutions and incorrect or improper uses of technologies.

“What we know is a drop, what we don’t know is an ocean.”
• Isaac Newton

Data engineering is one of the disciplines where you just know a drop. Some companies are saying it’s easy, and you just need to know a drop. My experience in the field and teaching tells me otherwise. A data engineer needs to learn many different technologies and possess in-depth knowledge of big data.

To help you sort it out, I want to help you imagine the skills as a technology tree. You might have played the Civilization series at some point and maybe even spent way too much time (just one more turn). If you aren’t familiar with it, here is Civilization 6’s technology tree.

Civilization 6’s technology tree

You’ll notice that you start with the most basic technologies in the world, such as pottery or animal husbandry. As you begin to research those technologies, you unlock more technologies. Each of these technologies takes a certain amount of turns to research, and the number of turns is based on the science your civilization produces.

If you didn’t know, you can try to skip researching technologies. Instead of gaining all of the foundational knowledge, the player can try to skip ahead. Skipping technologies causes all kinds of problems in-game, just like we’re about to see in our real-life example.

Let’s imagine data engineering as a technology tree. I think it all starts with a specialization in technology and branches out from there. These branches are systems, programming, and architecture. Looking at the diagram below, you can see various relationships.

Data Engineering Technology Tree

At the very end of lots of research and time (turns), we become a data engineer. Ideally, all or the vast majority of the data engineer’s tree is green. Leveraging all of the skills we’ve acquired, we can start to create systems. Those systems will produce data projects.

I hold this technology tree out as a way to gauge you or your team’s skills on the road to data engineering. Let’s go through a few examples of this tree.

DBA/Data Warehouse/SQL-Focused

 

Data Engineering Technology Tree for DBAs

Imagine that the team or individual comes from a DBA, Data Warehouse, or SQL-focused background. We can look at the diagram to see which skills (technologies) the team is missing.

We can see that the DBAs will have excellent SQL skills. The rest of the technology tree is missing. The software engineering skills are missing. There may be some understanding of the easier architecture skills such as data formats, but the rest of the advanced skills are missing. Using the technology tree, we can see that the skills acquisition will be extensive and time-consuming because the advanced skills are missing.

The companies and individuals that try to skip ahead on the technology tree without filling it in will have all kinds of problems. The lack of software engineering skills forces all code and technologies to be written with SQL. The lack of architecture leads to incorrect or improper uses of technologies.

Software Engineer

 

Data Engineering Technology Tree for Software Engineers

Let’s imagine that the team or individual comes from a software engineering background. Looking at the diagram, we can see that they have far more of the technology tree covered, but not the entire tree.

The software engineers will have excellent SQL and software engineering skills. The common missing parts of the tree are the multi-threading and coordination concepts that lead to big data. On the architecture side, they will be missing the big data technology ecosystem knowledge and distributed algorithms.

The companies and individuals that try to skip ahead on the technology tree without filling it in will still have problems. The lack of multi-threading skills that foundational to big data causes misunderstandings in solutions. The absence of ecosystem knowledge leads to incorrect or improper uses of technologies. I’ve found that these teams get stuck try to exhaustively go through each potential technology and not truly understanding it.

Data Scientists

 

Data Engineering Technology Tree for Data Scientists

 

Another common misconception is around data scientists and data engineers. Often, managers don’t understand the differences between data scientists and data engineers.

Data scientists will have some SQL and software engineering skills. However, these skills are on the beginner to intermediate level. They will be missing the big data technology ecosystem knowledge and likely distributed algorithms on the architecture side.

The companies and individuals that try to use data scientists as data engineers will have problems. The lack of multi-threading skills that foundational to big data causes misunderstandings in solutions. The absence of ecosystem knowledge leads to incorrect or improper uses of technologies. I’ve found that these teams choose technologies by popularity rather than fit for a use case.

Technology Trees and You

When management is looking to create a new or fix an existing data engineering team, they need to make sure the data engineers have the entire technology tree. When a team is under-performing, often they’re missing some or all of the technology tree. I suggest you read Data Teams or Data Engineering Teams to understand how to start or fix the team.

Individuals need to make an honest assessment of yourself and where you are on the technology tree. My Ultimate Guide to Switching Careers to Big Data will help you understand the next steps to take.

The implications of technology trees affect both management and individuals. In either case, their technology tree’s completeness will dictate the success or failure of their projects or goals.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased