When people ask me what they should be learning next, I tell them to start learning real-time Big Data systems. Real-time Big Data is something I’ve been focusing on for the past 5+ years. This is because I saw it as the next trend in Big Data and I was right.
Companies are doing massive rollouts of real-time systems. They’re taking their existing batch Big Data systems and upgrading to real-time Big Data systems. Companies of all sizes and throughout the world are making this leap.
They’re doing these rollouts because batch systems were inherently limiting. Often, the company wanted to do things in real-time, but couldn’t due to technical limitations. Now, they’re looking and reacting to their data as it happens instead of hours later.
But there’s a consistent problem. There aren’t enough people with the skills in real-time Big Data systems. The companies can barely find and hire these people. There is a high demand for people with real-time Big Data skills and a low supply of people with these skills, due to its relative newness and complexity.
That’s where you come in. You already have the Big Data skills with batch systems. With the right training and skills, you can fill those open positions that need real-time Big Data.
You already know about this complexity increase because you’ve been working with batch Big Data. My experience is that batch Big Data is 10 times more complex than small data systems. With real-time Big Data systems, that’s another 5 to 10 times more complex than batch Big Data systems.
There are common reasons for this increase in complexity. With real-time, you’ll be using even more of the Big Data ecosystem. You’ll also need to learn, understand, and implement systems with brand-new technologies. These technologies have new concepts that you haven’t seen in batch. You need to understand the various failure scenarios and what they mean in real-time.
Then, there are the tradeoffs between each system. To achieve real-time or near real-time, each system will need to strike a balance in throughput and latency. Each system brings new concepts and implementations that are different from a batch.
Creating real-time systems is more than just learning the API calls. You’ve been dealing with batch Big Data and know that you need to understand the architecture of the underlying system. Otherwise, you’ll have a program that compiles but never works in production. You need a deeper understanding of the systems to create a solution or pass an interview.
There are four general types of technologies in real-time Big Data systems:
A processor is a part that processes the incoming data. As data comes into a system, it needs to be changed and transformed. The processor is responsible for getting the data ready for subsequent usage.
Analytics is the part that creates some kind of value out of the data. This is the most important part of the pipeline for the business. This is where you take the data and show what’s happening. On the simple side, this could be counting interactions in real-time. On the complex side, this could be a real-time data science or machine learning model.
To move data around and save it, you will need a system for ingestion and dissemination. When you’re moving at a Big Data scale and in real-time, the system needs to be able to scale. It needs to provide the data at a fast speed to many different systems doing processing and analytics.
Storage is another issue for real-time systems. Storing many small files leads to issues on many Big Data systems. Not all processing and analytics should be done in real-time. You will still need to go back and process in batch. A good storage mechanism is crucial to a real-time data pipeline.
Some technologies may be a mix of 2 or more of these types. This is where things get cloudy. You need to deeply understand each technology and the pieces that are required to create a real-time data pipeline.
This class covers the technologies and concepts you need to know when creating real-time data pipelines. I use my extensive knowledge and experience to teach you what you need to know. I only focus on the technologies I’m seeing in use at companies.
The class is entirely virtual and you can go at your own pace. The course comes with everything you need to get started creating your own real-time data pipelines:
I don’t just cover a few technologies. I show you the open-source and cloud ecosystem of real-time products. This will give you the well-rounded skills that companies want.
Let me share what each chapter covers and teaches:
Chapter 1 – Real-time Data Pipelines
Chapter 2 – Using the Cloud
Chapter 3 – Ingesting Data
Chapter 4 – Kafka
Chapter 5 – Processing Data
Chapter 6 – Spark Streaming
Chapter 7 – Data Products
This class isn’t designed for everyone. To be successful with this class you should:
This class does not:
I’ve been teaching this class extensively at O’Reilly’s Strata conferences and companies around the world. This is because I’m a recognized expert in the field and I was one of the first people teaching real-time Big Data technologies like Apache Kafka and Spark Streaming.
How do you know if this course works? This course already runs at companies. It has taken teams of developers and made their teams of Data Engineers. This course already runs at training facilities. It has already taken students who were Software Developers and made them Data Engineers who got their Dream Jobs.
Big Data is changing constantly, how do I know this course is up-to-date? This course already runs at companies and those companies expect that their students are learning from up-to-date materials. The materials and code are updated to the latest versions of CDH. My courses cover current and future technologies. Many of my students are hired because they’ve learned a future technology that the company wants to start using.
Which technologies should you learn? I’ve curated and tested this course to teach the technologies and concepts that companies need and are using in production. Even better are the technologies and concepts it doesn’t cover. This course removes the unnecessary concepts for developers and technologies that don’t make sense or aren’t used. Given my industry expertise, we even cover up-and-coming technologies that will set you apart on your job search.
How will you be productive and start coding? Installing Big Data tools is an ordeal unto itself (trust me). You don’t want to waste hours getting things installed and configured before you can even start being productive. I’ve created a virtual machine that gets you up and running quickly. Everything is already installed and configured for you. It has Hadoop, Spark, many ecosystem projects, and Eclipse installed. You just install VirtualBox, import the VM, and you’re ready to go. No wasting time.
How will you practice the skills that you need to master? The course makes heavy use of exercises to practice the skills that you have just learned. There is a full exercise guide that gives you instructions on what to do. These exercises gradually increase in difficulty as you start to master new skills. Each programming exercise has a full sample solution that you can peek at if you get stuck or want to compare your solution with mine. At the end of most modules, there is a final. This final helps you check if you have mastered the skills you need.
Does this course just cover real-time Big Data technologies? This course focuses only on real-time technologies. It only shows batch processing as a means of comparison between batch and real-time. It does show how to use D3, which is a visualization technology.
Do you have to go in order? I highly recommend you go in order. Advanced programmers can skip around if they feel it’s necessary, but they will miss important concepts. This is something I can’t do in a class.
How long will this class take to complete? This class can be done in 2-3 days of concerted effort. Or it could be done over 1-2 weeks with less time put in.
How does this compare to training from company X? There are various sources out there for Big Data training. There is a vast difference in quality, veracity, and teaching out there. The majority of them are on the lower end of quality. Purchasing a low-quality course isn’t just a waste of money; it’s a waste of your time and you won’t get the job. Quality training is the difference between being successful and failure.
Can I get my company to reimburse me? Yes, other students who have purchased this course have had their purchase reimbursed by their company. Many companies have continuing education budgets or new projects have money allotted for training. This is especially true for new and difficult initiatives like Big Data. I will help you however I can to get your purchase reimbursed by your company. Send this PDF to your boss or Human Resources department to convince them to reimburse you.
I stand behind this course 100%. I want you to love this course 100% percent too. If you don’t love this course, I’ll give you 100% of your money back. That’s right 100% money-back guarantee, no matter how deep you are in the course.
Go through the materials. See that they’re the best. Go through the exercises and see yourself becoming the Data Engineer you want to become. I’m confident you’ll be successful.
I’ve built my teaching methods over years of teaching Data Engineering classes. These methods are honed over class after class. No one else is offering classes like these that are so comprehensive. No one else is teaching with such innovative methods. No one else is teaching practical skills.
This course isn’t for everyone as we established before. This course is for people who want to learn real-time Big Data systems. Even within that group, not everyone has the programming skills to create real-data pipelines and I understand that. I’ll give you your money back.
Here is my simple offer: if you don’t love this course within 60 days, I insist that you get 100% of your money back. Guaranteed. Join at the level that’s right for you and see how you can get the real-time Big Data skills you need to get ahead.