Iâ€™ve been teaching Kafka at companies without the textbook definition of Big Data problems. They donâ€™t have, and will not have in the future, what youâ€™d define as Big Data problems. As a result, the students ask me if using Kafka is appropriate for their use cases. Put another way, is Kafka only a Big Data tool?
For most Big Data technologies, not having or having a Big Data problem in the future is the reason not to use technologies like Apache Hadoop or Apache Spark. Itâ€™s a pretty clear pass/fail because the technical and operational overhead of these projects immediately negates any other benefits. Using Big Data for small data isnâ€™t just massive overkill; itâ€™s going to waste a lot of time and money.
For Kafka, itâ€™s different. I define Kafka as a distributed publish subscribe system. Companies without clear Big Data problems are gaining value from it. Theyâ€™re able to use the other interesting features of Kafka.
Here are some of the pros I see for using Kafka with small data:
- All data can be replicated to more than one computer
- Kafka removes single points of failure for the brokers
- Kafka removes single points of failure for consumers with consumer groups
- Consumers can move freely through the commit log and go back in time
- Consumers donâ€™t miss data as a result of downtime because the data is saved
Here are some of the cons I see for using Kafka compared to a traditional small data pub/sub:
- Programmatic API is more complex than others
- Conceptually more complex (e.g. partitions and offsets) than others
- Ordering is no longer global and is only on a partition basis
- Consumer groups will need to handle state transitions for failures
- Fewer people available with Kafka skills (you will probably need to train)
- Operationally, more processes will need to be monitored
With these pros and cons in mind, you can make a choice between Kafka and your small data pub/sub of choice. If the pros are really compelling and outweigh the cons, I suggest you start looking at Kafka. If the cons outweigh, youâ€™re probably better off with your small data pub/sub.
Learn more about how Kafka works here: