Data Engineering Technology Tree

Jesse Anderson
December 16, 2020
Blog, Business, Magnum Opus
No Comments

Blog Summary: (AI Summaries by Summarizes)

Data engineering requires in-depth knowledge of various technologies and big data.
Visualizing skills as a technology tree can help in understanding the complexity of data engineering.
Skipping foundational knowledge in data engineering can lead to problems similar to skipping technologies in a game like Civilization.
Becoming a proficient data engineer involves acquiring a broad range of skills represented by the technology tree.
Different backgrounds like DBA, Data Warehouse, or SQL-focused have varying levels of coverage in the technology tree.

“What we know is a drop, what we don’t know is an ocean.”
• Isaac Newton

Data engineering is one of the disciplines where you just know a drop. Some companies are saying it’s easy, and you just need to know a drop. My experience in the field and teaching tells me otherwise. A data engineer needs to learn many different technologies and possess in-depth knowledge of big data.

To help you sort it out, I want to help you imagine the skills as a technology tree. You might have played the Civilization series at some point and maybe even spent way too much time (just one more turn). If you aren’t familiar with it, here is Civilization 6’s technology tree.

You’ll notice that you start with the most basic technologies in the world, such as pottery or animal husbandry. As you begin to research those technologies, you unlock more technologies. Each of these technologies takes a certain amount of turns to research, and the number of turns is based on the science your civilization produces.

If you didn’t know, you can try to skip researching technologies. Instead of gaining all of the foundational knowledge, the player can try to skip ahead. Skipping technologies causes all kinds of problems in-game, just like we’re about to see in our real-life example.

Let’s imagine data engineering as a technology tree. I think it all starts with a specialization in technology and branches out from there. These branches are systems, programming, and architecture. Looking at the diagram below, you can see various relationships.

At the very end of lots of research and time (turns), we become a data engineer. Ideally, all or the vast majority of the data engineer’s tree is green. Leveraging all of the skills we’ve acquired, we can start to create systems. Those systems will produce data projects.

I hold this technology tree out as a way to gauge you or your team’s skills on the road to data engineering. Let’s go through a few examples of this tree.

DBA/Data Warehouse/SQL-Focused

Data Engineering Technology Tree for DBAs

Imagine that the team or individual comes from a DBA, Data Warehouse, or SQL-focused background. We can look at the diagram to see which skills (technologies) the team is missing.

We can see that the DBAs will have excellent SQL skills. The rest of the technology tree is missing. The software engineering skills are missing. There may be some understanding of the easier architecture skills such as data formats, but the rest of the advanced skills are missing. Using the technology tree, we can see that the skills acquisition will be extensive and time-consuming because the advanced skills are missing.

The companies and individuals that try to skip ahead on the technology tree without filling it in will have all kinds of problems. The lack of software engineering skills forces all code and technologies to be written with SQL. The lack of architecture leads to incorrect or improper uses of technologies.

Software Engineer

Let’s imagine that the team or individual comes from a software engineering background. Looking at the diagram, we can see that they have far more of the technology tree covered, but not the entire tree.

The software engineers will have excellent SQL and software engineering skills. The common missing parts of the tree are the multi-threading and coordination concepts that lead to big data. On the architecture side, they will be missing the big data technology ecosystem knowledge and distributed algorithms.

The companies and individuals that try to skip ahead on the technology tree without filling it in will still have problems. The lack of multi-threading skills that foundational to big data causes misunderstandings in solutions. The absence of ecosystem knowledge leads to incorrect or improper uses of technologies. I’ve found that these teams get stuck try to exhaustively go through each potential technology and not truly understanding it.

Data Scientists

Another common misconception is around data scientists and data engineers. Often, managers don’t understand the differences between data scientists and data engineers.

Data scientists will have some SQL and software engineering skills. However, these skills are on the beginner to intermediate level. They will be missing the big data technology ecosystem knowledge and likely distributed algorithms on the architecture side.

The companies and individuals that try to use data scientists as data engineers will have problems. The lack of multi-threading skills that foundational to big data causes misunderstandings in solutions. The absence of ecosystem knowledge leads to incorrect or improper uses of technologies. I’ve found that these teams choose technologies by popularity rather than fit for a use case.

Technology Trees and You

When management is looking to create a new or fix an existing data engineering team, they need to make sure the data engineers have the entire technology tree. When a team is under-performing, often they’re missing some or all of the technology tree. I suggest you read Data Teams or Data Engineering Teams to understand how to start or fix the team.

Individuals need to make an honest assessment of yourself and where you are on the technology tree. My Ultimate Guide to Switching Careers to Big Data will help you understand the next steps to take.

The implications of technology trees affect both management and individuals. In either case, their technology tree’s completeness will dictate the success or failure of their projects or goals.

Frequently Asked Questions (AI FAQ by Summarizes)

What is required for data engineering?

Data engineering requires in-depth knowledge of various technologies and big data.

How can visualizing skills as a technology tree help in data engineering?

Visualizing skills as a technology tree can help in understanding the complexity of data engineering.

What can happen if foundational knowledge in data engineering is skipped?

Skipping foundational knowledge in data engineering can lead to problems similar to skipping technologies in a game like Civilization.

What is involved in becoming a proficient data engineer?

Becoming a proficient data engineer involves acquiring a broad range of skills represented by the technology tree.

How do different backgrounds impact coverage in the technology tree?

Different backgrounds like DBA, Data Warehouse, or SQL-focused have varying levels of coverage in the technology tree.

What challenges may arise if companies skip ahead on the technology tree without filling in the gaps?

Companies and individuals skipping ahead on the technology tree without filling in the gaps may face challenges in project implementation.

What skills do software engineers typically have in data engineering?

Software engineers typically have strong SQL and software engineering skills but may lack advanced big data knowledge.

Why can misunderstandings occur between data scientists and data engineers?

Misunderstandings between data scientists and data engineers can arise due to differences in skill levels and knowledge.

Why is ensuring complete technology tree coverage crucial for data engineering success?

Ensuring a complete technology tree coverage is crucial for the success of data engineering teams and projects.

How can individuals plan career transitions effectively in data engineering?

Individuals need to assess their position on the technology tree to understand their skill gaps and plan career transitions effectively.

Data Engineering Technology Tree

DBA/Data Warehouse/SQL-Focused

Imagine that the team or individual comes from a DBA, Data Warehouse, or SQL-focused background. We can look at the diagram to see which skills (technologies) the team is missing.

Software Engineer

Data Scientists

Technology Trees and You

Frequently Asked Questions (AI FAQ by Summarizes)

What is required for data engineering?

How can visualizing skills as a technology tree help in data engineering?

What can happen if foundational knowledge in data engineering is skipped?

What is involved in becoming a proficient data engineer?

How do different backgrounds impact coverage in the technology tree?

What challenges may arise if companies skip ahead on the technology tree without filling in the gaps?

What skills do software engineers typically have in data engineering?

Why can misunderstandings occur between data scientists and data engineers?

Why is ensuring complete technology tree coverage crucial for data engineering success?

How can individuals plan career transitions effectively in data engineering?

Related Posts

Gemini Batch API for Java

Unapologetically Technical Episode 20 – Shane Murray

Unapologetically Technical Episode 19 – Jacopo Tagliabue

Unapologetically Technical Episode 18 – Adrian Woodhead

Unapologetically Technical Episode 17 – Semih Salihoglu

Unapologetically Technical Episode 16 – David Jayatillake

Unapologetically Technical Episode 15 – Frances Perry

Unapologetically Technical Episode 14 – Cliff Crosland

Data Teams Survey 2020-2024 Analysis

Join the Newsletter