Companies and individuals often come into Big Data thinking everything is cheap. After all, the entire stack is open source, right? Well, some things are cheap and some things are more expensive.
One of the important distinctions with Hadoop is that it isn’t an open source knock off of a better closed source framework. Hadoop is the gold standard for both startups and enterprises.
That often stands in contrast to small data solutions. Sometimes, companies will have an open source alternative that’s the clone of the commonly used closed source solution. The open source option will lack the polish or features present in the closed source option.
Since Hadoop is used both at startups and enterprises, this allows data engineers to learn one system and use it no matter what size of company they move to.
Small data solutions often need a single computer. This computer can serve as the database, application layer, and webserver.
In Big Data, many different computers are needed. If you have true Big Data needs, a single computer won’t be able to handle all of the processing and storage necessary. You’ll need at least three computers just for storage and processing. Often, you’ll collocate the server daemons on these three computers and, as the cluster grows, you’ll move these daemons onto their own computer.
Depending on the SLA for the cluster, some small clusters having high availability (HA) from the beginning. These clusters locate their server daemons on at least two other computers. This means that even a starter cluster can need five or more computers.
Small data engineers tend to be more plentiful. Due to their lack of specialization, their salaries are lower in comparison to Big Data engineers. Some small data engineers simply won’t be able to make the leap to Big Data.
Even within the engineers that identify as Big Data or Data Engineers, there is a great deal of variation in abilities and experience. Finding a “Data Engineer” who is willing to work for the same amount as a small data engineer should be red flag. I’ve taught these data engineers and they have a very low probability of success on any given project. The quality of people is one of the first things I look for when judging the probability of success on a team.
Is Big Data Cheaper?
Some things like software are cheaper. The rest of Big Data is more expensive. Cheaping out on the expensive parts leads to project failure. It’s one of the most common ways of failing I’ve seen to fail quickly.