Most companies aren’t experiencing Big Data or small data problems. They’re experiencing a witching hour of sorts. This a point in their growth where their data is too big for small data and too small for Big Data. As I’m teaching at companies, I’m finding as much as 80% of use cases are falling into this conundrum.
I’ve taken to calling this in between small data and Big Data “medium data.” Medium data is from companies that are looking to move from small data to Big Data, but don’t quite need the BIg Data scales. This potential for over-engineering is making companies less successful and putting the blame on Big Data.
I’ll give a few empirical examples.
Moving Off NoSQL
I was teaching at a financial company and the told me about a production case. They were using a NoSQL database and were moving it back to a RDBMS.
I took this as a use case in the medium data space because:
- Someone over-engineered or wanted to pad their resume with a NoSQL product in production.
- The person who came along afterward didn’t understand NoSQL and wanted to get it back to a technology they understood.
- The person who architected the system didn’t really understand the pros and cons of the technologies as applied to the use case.
It’s incredibly important to use the right tool for the job, but this one is interesting. If the team were truly using NoSQL for its strengths and abilities, they shouldn’t be able to go back to a RDBMS. The RDBMS shouldn’t have been able to scale to the levels that the use case required. This would only be possible if the use case was in this medium data space where both small data and Big Data technologies are viable.
Startups and Small Companies
Most startups and small companies don’t have Big Data requirements. Usually, their usage of Big Data is based on expected growth; they’re expecting to grow into their Big Data technologies.
This is another time I see medium data. They’re trying to decide when to transition over to Big Data technologies. This transition is especially important for small companies because of the increase in complexity. The team knows it will have to pay the piper sometime and they’re choosing to pay it early.
Why Does It Matter?
Using Big Data for medium data increases the complexity dramatically. These issues manifest as complexity architecturally, programmatically, and operationally. This means that you’re using technologies that are probably overkill for the problems you’re working on.
For a small company without the proper skills and training, this can cut productivity and increase costs. These costs aren’t just operational. They represent increased costs for better programmers, architects, and operations personnel.
I’m starting to see some technologies address medium data. Usually, it’s a downward move in a Big Data technology. They’re making it easier to deploy or run a much smaller cluster that addresses medium data.
I think this is spot on.
Jesse, what would you say is the right time to move to Big Data? What are the factors that you consider, and I mean going beyond the 3vs?
I don’t really think the 3-5 V’s really give a good definition of Big Data. I plan to write a post about it later. My basic definition is what can’t you do because the technology doesn’t allow you to scale.
I am in company that wanted to be big data, but just not there. Spent a year using a tech stack unfit for the size of the data. I am scaling back the architecture to more accurately reflect the size of the data.
I’ve seen too many companies jumping on the Big Data bandwagon without a true Big Data need. In my management classes, I talk about this being one of the first decisions you make. Do you need Big Data or not?
Great post, thanks for sharing. What companies would you say are suffering this conundrum?
I’m not going to name names of companies with this problem. Just make sure your company has a true Big Data need. You’ll need to verify that need with a qualified Data Engineer.