Is Big Data Cheap?

Jesse Anderson
August 10, 2016
Blog, Business, Data Engineering
One Comment

Blog Summary: (AI Summaries by Summarizes)

Big Data is not always cheap, some things are cheap and some things are more expensive.
Hadoop is the gold standard for both startups and enterprises, and it is not an open source knock off of a better closed source framework.
Small data solutions often need a single computer, while in Big Data, many different computers are needed.
Depending on the SLA for the cluster, even a starter cluster can need five or more computers.
Small data engineers tend to be more plentiful and their salaries are lower in comparison to Big Data engineers.

Companies and individuals often come into Big Data thinking everything is cheap. After all, the entire stack is open source, right? Well, some things are cheap and some things are more expensive.

Software

One of the important distinctions with Hadoop is that it isn’t an open source knock off of a better closed source framework. Hadoop is the gold standard for both startups and enterprises.

That often stands in contrast to small data solutions. Sometimes, companies will have an open source alternative that’s the clone of the commonly used closed source solution. The open source option will lack the polish or features present in the closed source option.

Since Hadoop is used both at startups and enterprises, this allows data engineers to learn one system and use it no matter what size of company they move to.

Hardware

Small data solutions often need a single computer. This computer can serve as the database, application layer, and webserver.

In Big Data, many different computers are needed. If you have true Big Data needs, a single computer won’t be able to handle all of the processing and storage necessary. You’ll need at least three computers just for storage and processing. Often, you’ll collocate the server daemons on these three computers and, as the cluster grows, you’ll move these daemons onto their own computer.

Depending on the SLA for the cluster, some small clusters having high availability (HA) from the beginning. These clusters locate their server daemons on at least two other computers. This means that even a starter cluster can need five or more computers.

People

Small data engineers tend to be more plentiful. Due to their lack of specialization, their salaries are lower in comparison to Big Data engineers. Some small data engineers simply won’t be able to make the leap to Big Data.

Even within the engineers that identify as Big Data or Data Engineers, there is a great deal of variation in abilities and experience. Finding a “Data Engineer” who is willing to work for the same amount as a small data engineer should be red flag. I’ve taught these data engineers and they have a very low probability of success on any given project. The quality of people is one of the first things I look for when judging the probability of success on a team.

Is Big Data Cheaper?

Some things like software are cheaper. The rest of Big Data is more expensive. Cheaping out on the expensive parts leads to project failure. It’s one of the most common ways of failing I’ve seen to fail quickly.

Is Big Data Cheap?

Software

Hardware

People

Is Big Data Cheaper?

Related Posts

Unapologetically Technical Episode 10 – Michael Drogalis

Why Most Data Projects Fail & How to Avoid It at GOTO 2023

Unapologetically Technical Episode 9 – Gunnar Morling

Unapologetically Technical Episode 8 – Tom Scott

The State of Data Engineering at Data Day Texas 2024

Unapologetically Technical Episode 7 – Stephane Derosiaux

The Difference Between Learning and Doing

Unapologetically Technical Episode 6 – Matteo Merli

The Data Discovery Team

Join the Newsletter