Q and A: Is a Data Engineer the same thing as a BI or DBA?

Blog Summary: (AI Summaries by Summarizes)
  • A Data Engineer is someone who specializes in creating software solutions around data, predominantly based around Hadoop, Spark, and the open source Big Data ecosystem.
  • Data Engineers are not the same as DBAs, Business Intelligence, Data Analysts, or ETL Developers, but people with these titles can become Data Engineers with training and new skills.
  • Data Engineers are tasked with creating data pipelines and data products, which are often outside the abilities of non-programmers because they require custom programming and code.
  • A Data Engineer's primary language needs to be Java, but they also need to know SQL and at least one dynamic language like Python or Scala.
  • Virtually every project in the Big Data ecosystem has a Java API, but some pipelines will be a mix of Java, SQL, and a dynamic language.

Today’s blog post comes from a question from a subscriber on my mailing list. The question come from Alpesh D.:

I have been getting your emails and they all seem to make sense. However, did I understand it correct that you believe all big data engineers need to be to use Java? I come from a heavy SQL, MPP data warehousing and BI background. With having done shell scripting from my days when I was a DBA I am able to pick up Python and move ahead but Java seems like a little too much. What are your thoughts?

I think your questions could be restated as two questions:

  • Is a Data Engineer the same thing as a BI or DBA?
  • Does a Data Engineer need to use Java?

Is a Data Engineer the same thing as a BI or DBA?

A Data Engineer is someone who has specialized their skills in creating software solutions around data. Their skills are predominantly based around Hadoop, Spark, and the open source Big Data ecosystem. They usually program in Java, Scala, or Python. They have an in-depth knowledge of creating data pipelines. Data pipelines are how data is brought in, processed, and create some kind of business value. This business value is usually reports, analytics, and dashboarding. More advanced examples are fraud analytics or predictive analytics pipelines.

They are not a DBA (Database Administrator), Business Intelligence, Data Analyst, or ETL Developer. That’s not to say a person with these titles couldn’t be a Data Engineer. Rather, people with these titles will need training and probably entirely new skills to become a Data Engineer. Usually, they’ll need more programming skills and Big Data skills than most people with these titles.

Data Engineers are tasked with creating data pipelines and data products. Complex data pipelines are often outside the abilities of non-programmers because they require custom programming and code.

Does a Data Engineer need to use Java?

A Data Engineer’s primary language needs to be Java. They’ll also need to know SQL and I highly recommend they know at least one dynamic language like Python or Scala.

If you look around the Big Data ecosystem, virtually every one of the projects has a Java API. Some projects may support a Java API and another language. That doesn’t mean everything in a data pipeline is limited to Java. Some pipelines will be a mix of Java, SQL, and a dynamic language.

I’ve trained at companies where their data team was limited to a knowledge of SQL. They are severely limited in what they can accomplish with SQL. You can do some interesting things with SQL and I recommend using SQL for some operations. But when SQL is your only tool, you can’t use the other ecosystem tools that don’t have a SQL interface and, if SQL couldn’t do it, it simply wasn’t done. They had no other alternative to create something else.

Join my mailing list and I might answer your question next time.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased