How much do companies lose before training?

Blog Summary: (AI Summaries by Summarizes)
  • Starting to write code or design a solution before receiving proper training is a bad idea, especially in Big Data.
  • Making a mistake with small data isn't costly and can be fixed quickly, but making a mistake with Big Data is very costly and can take a while to fix.
  • Companies who start coding before being trained waste an average of $100,000 to $200,000, and this number can go as high as $1,000,000 to $1,500,000 for companies that waited months before being trained.
  • Training saves money by avoiding bad ideas or abuses of technology that can turn into major problems and wastes of money down the road.
  • The average cost of hypothetical "what if" scenarios due to not receiving training is $300,000 to $400,000, based on downtime estimates, extra operations time, and code rewrites.

Sometimes companies will start writing code or designing a solution before I train there. This is usually a bad idea. It really shows the difference between Big Data and small data. Making a mistake with small data isn’t costly and doesn’t take long to fix. Making a mistake with Big Data is very costly and can take a while to fix.

Companies who start coding before they’ve been trained waste an average of $100,000 to $200,000. I’ve seen this number go as high as $1,000,000 to $1,500,000 for companies that waited months before being trained. For them, training was a way to get out of a deep hole.

These numbers are based on my conversations with the engineers about how much time was spent already, how much time they’ll have to spend fixing things, and the opportunity cost. I’ve written extensively about how training saves you money.

The numbers you just read are only the numbers for wasted time up to that point. They don’t cover the hypothetical “what if” they didn’t receive the training. While I’m training a team, I’m paying attention to any bad ideas or abuses of a technology. These are the genesis for major problems down the road. These major problems turn into major wastes of money down the road. The average for this is $300,000 to $400,000.

These numbers are based on downtime estimates, extra operations time, and rewrites of code.

What If

Let me give you an example of a company that avoided a “what if” scenario. I was training at a company on real-time distributed systems. They were going to do a real-time, non-time bounded join. That means two streams would be joined in real-time, but the two streams weren’t temporally in-sync. It could take an hour or 12 hours for the other message to come through the system. This scenario is possible, but it was over-engineered and operationally fragile.

In talking to the engineer, I found a much simpler and less operationally intense method. It still satisfied all of the requirements. The engineer had spent a month solid writing that code. The operations costs would have been weeks of time from diagnosing weird problems to outright downtime from the system not working.

My $25,000 in training saved that company at least $400,000. Had they come to me before starting it would have been at least $500,000. I’ll take ROI like that anytime.

If you’re still looking at those numbers and thinking it isn’t possible, you’re still thinking in small data terms. Due to its sheer complexity, a mistake or outright misunderstanding of Big Data technologies is costly.

If you’re starting on a Big Data project or wanting to become a Data Engineer, I strongly urge you to get training. Otherwise, you’ll be risking hundreds of thousands of dollars.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased