How Are Programming and Distributed Systems Different?

How Are Programming and Distributed Systems Different?

by Jesse Anderson | Feb 22, 2017 | Blog, Business, Data Engineering, Data Engineering is hard | 0 comments

In my book Data Engineering Teams, I separate out programming as a different skill than distributed systems. The section is the “Skills Needed in a Team” and talks about the various skills that a data engineering team needs. Several people have emailed me...
How Are Programming and Distributed Systems Different?

Announcement: Data Engineering Teams Book

by Jesse Anderson | Feb 17, 2017 | Blog, Business, Data Engineering, Data Engineering is hard, Magnum Opus | 0 comments

I’m really tired of seeing Big Data projects fail. They fail for both technical and managerial reasons. They all fail for similar reasons and that’s just sad because we can fix or prevent them. Gartner’s research shows that 85% of Big Data projects...

Is Kafka Only a Big Data Tool?

by Jesse Anderson | Feb 8, 2017 | Blog, Business, Data Engineering, Data Engineering is hard | 0 comments

I’ve been teaching Kafka at companies without the textbook definition of Big Data problems. They don’t have, and will not have in the future, what you’d define as Big Data problems. As a result, the students ask me if using Kafka is appropriate for...
What Do I Look for in Data Engineers?

What Do I Look for in Data Engineers?

by Jesse Anderson | Feb 1, 2017 | Blog, Business, Data Engineering, Data Engineering is hard | 2 comments

I want to share with you some of the traits that I’ve found in especially good Data Engineers. Every one of these traits may not be in every Data Engineer, but you will find several. I can’t stress enough how important it is for a Data Engineer to have a...

Q and A: Ingesting into Hadoop

by Jesse Anderson | Jan 25, 2017 | Blog, Business, Data Engineering, Data Engineering is hard | 0 comments

Today’s blog post comes from a question from a subscriber on my mailing list. The question come from Guruprasad B.R.: What are the best ways to Ingest data in to Big Data (HBase/HDFS) from different sources like FTP, Web, Email, RDBMS,..etc There are a couple...

Hadoop MapReduce Dedupe Algorithm

by Jesse Anderson | Jan 18, 2017 | Blog, Business, Data Engineering, Data Engineering is hard | 0 comments

In this video, I live code a dedupe algorithm. If you’re not familiar with this algorithm, you need to take several data files and remove the duplicates. I show the simple version. Then, I show a more complicated version that adds some custom logic. If you want...
« Older Entries
Next Entries »
JA Footer Icon
Twitter Linkedin Rss

© Jesse Anderson 2022

Join the Newsletter
Jesse Anderson signature