- Big Data technologies will continue to mature over the next 5-10 years.
- Better stories on the things enterprises need will emerge.
- Technologies for metadata management and granular authorization will improve.
- Hadoop MapReduce will gradually phase out, while Apache Spark will mature and stabilize.
- Data Engineers will face difficulty in choosing the right tool for the job due to the increase in technologies.
I’m often asked what I think will happen to Big Data over the next five to ten years. From a Developer’s point of view, they’re asking if investing their time in becoming a Data Engineer will pay off.
We’re going to see a continuing maturity of Big Data technologies. There will be better stories on the things enterprises need. We’ll see better technologies for metadata management and granular authorization.
We’ll see some technologies gradually phase out, like Hadoop MapReduce. We’ll look for other technologies to mature and stabilize, Apache Spark. With Hadoop MapReduce there was really only one processing engine. In the present future, we’re seeing that Spark is accompanied by a bevy of technologies. Some of these are focused batch and streaming (real-time). Others are focused on just streaming (real-time).
This increase in technologies will make it more difficult for Data Engineers to choose the right tool for the job. With streaming, the engineer will need to know the tradeoffs for 10+ different streaming technologies.
We’ll see data engineering teams become more standardized and homogenous across companies. Right now, data engineering teams range from data warehousing teams made up of DBAs, to solely programmers, to cross-functional teams that have programmers, DBA, and analysts or Data Scientists. Data engineering teams will realize the need to have more cross-functionality and have the right makeup of people in the team.
APIs like Apache Beam will change how we interact with data. Instead of Data Engineers having to learn several different APIs, they’ll just learn one. Instead of having to differentiate between Big Data or small data, they won’t; everything will be data. The API won’t change, but the execution engine will change.
Companies are going to see the importance of hiring qualified Data Engineers. We’ll continue to see that if you have real Big Data problems, only a qualified Data Engineer will solve them. Companies will learn that unqualified programmers create project failures.
The next five to ten years will see changes for Data Engineers. The fundamentals will stay the same, but the implementations will be changing. It will be incumbent on Data Engineers to keep up with the latest changes. They will be an ever increasing demand for qualified Data Engineers. Investing your time now will pay the highest dividends.