What will become of Big Data?

Blog Summary: (AI Summaries by Summarizes)
  • Big Data technologies will continue to mature over the next 5-10 years.
  • Better stories on the things enterprises need will emerge.
  • Technologies for metadata management and granular authorization will improve.
  • Hadoop MapReduce will gradually phase out, while Apache Spark will mature and stabilize.
  • Data Engineers will face difficulty in choosing the right tool for the job due to the increase in technologies.

I’m often asked what I think will happen to Big Data over the next five to ten years. From a Developer’s point of view, they’re asking if investing their time in becoming a Data Engineer will pay off.

We’re going to see a continuing maturity of Big Data technologies. There will be better stories on the things enterprises need. We’ll see better technologies for metadata management and granular authorization.

We’ll see some technologies gradually phase out, like Hadoop MapReduce. We’ll look for other technologies to mature and stabilize, Apache Spark. With Hadoop MapReduce there was really only one processing engine. In the present future, we’re seeing that Spark is accompanied by a bevy of technologies. Some of these are focused batch and streaming (real-time). Others are focused on just streaming (real-time).

This increase in technologies will make it more difficult for Data Engineers to choose the right tool for the job. With streaming, the engineer will need to know the tradeoffs for 10+ different streaming technologies.

We’ll see data engineering teams become more standardized and homogenous across companies. Right now, data engineering teams range from data warehousing teams made up of DBAs, to solely programmers, to cross-functional teams that have programmers, DBA, and analysts or Data Scientists. Data engineering teams will realize the need to have more cross-functionality and have the right makeup of people in the team.

APIs like Apache Beam will change how we interact with data. Instead of Data Engineers having to learn several different APIs, they’ll just learn one. Instead of having to differentiate between Big Data or small data, they won’t; everything will be data. The API won’t change, but the execution engine will change.

Companies are going to see the importance of hiring qualified Data Engineers. We’ll continue to see that if you have real Big Data problems, only a qualified Data Engineer will solve them. Companies will learn that unqualified programmers create project failures.

The next five to ten years will see changes for Data Engineers. The fundamentals will stay the same, but the implementations will be changing. It will be incumbent on Data Engineers to keep up with the latest changes. They will be an ever increasing demand for qualified Data Engineers. Investing your time now will pay the highest dividends.

Related Posts

The Difference Between Learning and Doing

Blog Summary: (AI Summaries by Summarizes)There are several types of learning videos: hype, low effort, novice, and professional.It is important to avoid hype, low-effort, and

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes)The concept of a “data discovery team” is introduced, which focuses on searching for data in an enterprise data reality.Data

Black and white photo of three corporate people discussing with a view of the city's buildings

Current 2023 Announcements

Blog Summary: (AI Summaries by Summarizes)Confluent’s Current Conference featured several announcements that are important for both technologists and investors.Confluent has two existing moats (replication and

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on