Sometimes companies will start writing code or designing a solution before I train there. This is usually a bad idea. It really shows the difference between Big Data and small data. Making a mistake with small data isnâ€™t costly and doesnâ€™t take long to fix. Making a mistake with Big Data is very costly and can take a while to fix.
Companies who start coding before theyâ€™ve been trained waste an average of $100,000 to $200,000. Iâ€™ve seen this number go as high as $1,000,000 to $1,500,000 for companies that waited months before being trained. For them, training was a way to get out of a deep hole.
These numbers are based on my conversations with the engineers about how much time was spent already, how much time theyâ€™ll have to spend fixing things, and the opportunity cost. Iâ€™ve written extensively about how training saves you money.
The numbers you just read are only the numbers for wasted time up to that point. They donâ€™t cover the hypothetical â€œwhat ifâ€ they didnâ€™t receive the training. While Iâ€™m training a team, Iâ€™m paying attention to any bad ideas or abuses of a technology. These are the genesis for major problems down the road. These major problems turn into major wastes of money down the road. The average for this is $300,000 to $400,000.
These numbers are based on downtime estimates, extra operations time, and rewrites of code.
Let me give you an example of a company that avoided a â€œwhat ifâ€ scenario. I was training at a company on real-time distributed systems. They were going to do a real-time, non-time bounded join. That means two streams would be joined in real-time, but the two streams werenâ€™t temporally in-sync. It could take an hour or 12 hours for the other message to come through the system. This scenario is possible, but it was over-engineered and operationally fragile.
In talking to the engineer, I found a much simpler and less operationally intense method. It still satisfied all of the requirements. The engineer had spent a month solid writing that code. The operations costs would have been weeks of time from diagnosing weird problems to outright downtime from the system not working.
My $25,000 in training saved that company at least $400,000. Had they come to me before starting it would have been at least $500,000. Iâ€™ll take ROI like that anytime.
If you’re still looking at those numbers and thinking it isn’t possible, you’re still thinking in small data terms. Due to its sheer complexity, a mistake or outright misunderstanding of Big Data technologies is costly.
If you’re starting on a Big Data project or wanting to become a Data Engineer, I strongly urge you to get training. Otherwise, you’ll be risking hundreds of thousands of dollars.