We Live, Eat, and Breathe This Stuff

Blog Summary: (AI Summaries by Summarizes)
  • Learning big data and its expanding ecosystem takes time and effort, even for those with a background in distributed systems.
  • There are many projects and nuances to learn, making it difficult for newcomers to Big Data to understand the full scope and potential of the technology.
  • When integrating with Hadoop, it's important to consider all of the ecosystem technologies and how they will work together to make a successful project.
  • Data Engineers specialize in and have a deep understanding of the many technologies needed to create Big Data solutions, while Software Engineers may only have a cursory knowledge of Big Data.

The NFL ran a commercial a few years back. It featured various professional athletes from the NFL doing things you wouldn’t otherwise believe. One showed a quarterback shooting trap with his football instead of a shotgun. I’ve shot trap and it’s hard enough to with a shotgun, much less a football.

I see a similar thing with people starting out with technology, especially Big Data. They look at the accomplishments and abilities of the top people and expect themselves to be at that level immediately. That just isn’t going to happen very often.

Conversely, some of the people who specialize in Big Data expect this of others. This isn’t fair to those learning or starting out. They need time time to ramp up and learn.

This leads me to the theme of this post. We live, eat and breathe this stuff. I spend the vast majority of my time focused on Big Data. I haven’t really looked at other up-and-coming technologies in other areas. There are more than enough in Big Data.

Despite a good background in distributed systems, it still took me a good six months before I felt I really understood all of Hadoop and its expanding ecosystem. There really is a vast number of projects to learn. It’s even more difficult to learn the nuances of similar systems (ala Cassandra versus HBase).

Most of the newcomers to Big Data don’t realize this. When people talk about Hadoop, they’re generally referring to Hadoop and its ecosystem at the same time. The question: “How do I integrate X with Hadoop?” should really be translated as “What are all of the ecosystem technologies to integrate X and how will they all work together?” The two questions are very different and reflect the experience necessary to make a successful project.

This is the crux of the difference between a Software Engineer and a Data Engineer. The Software Engineers have a cursory knowledge of Big Data. The Data Engineers have specialized in and learned the many technologies need to create a Big Data solution. The Data Engineers live, eat and breathe this stuff.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased