Creating real-time data pipelines bring new challenges. There are new concepts and technologies that you’ll need to learn and understand. To help you understand the basic technologies you need in a real-time data pipeline, I break it down into 4 general types....
The move from batch to real-time Big Data represents change. It will entail using brand new technologies and concepts that you haven’t dealt with before. Batch Big Data Let’s start off by defining batch Big Data. For batch, all data must be there when the...
I wrote a post for the O’Reilly data blog going into my latest thoughts and views on data engineers versus data scientists. I continue on to talk about machine learning engineers. Can you switch careers to Big Data in 4 months or less?If you’re a Software...
There’s an elephant in the room with Big Data. If an organization tries to half-ass their way through a Big Data project, they’re going to fail (usually a 5-10% odds of success). Given this really low success rate, should you even do Big Data? When I...
Unit testing your Kafka code is incredibly important. I’ve already written about integration testing, consumer testing, and producer testing. Now, I’m going to share how to unit test your Kafka Streams code. To start off with, you will need to change your...