- Programming is a crucial skill for data engineering, but it's not the only one. System design and creation abilities are equally important.
- Understanding distributed systems and the technologies themselves is necessary to become a professional data engineer.
- Big infrastructure is required for data engineering, and it's important to understand the Big Data infrastructure and the different technologies.
- To practice and learn data engineering, it's recommended to start small on a VM and then run on a large scale in the cloud.
- Providing value at meetups is important, and it's key to show progress in learning. Attendees are potential coworkers, and they want to work with someone who is a quick study and getting better.
Today’s blog post comes from a question from a subscriber on my mailing list. The questions come from Vaughn S.:
- How is programming used in data engineering?
- What do I have to offer at meetups?
- How can I round out my skillset?
How is programming used in data engineering? I really really want to improve my programming. Stuff like Practice Python is nice, but I want to get familiar with the kinds of things that data engineers do on a day-to-day basis. My understanding is that a lot of DE revolves a lot around ETL, batch processing and data lakes/data warehouses. Don’t you need a big infrastructure for that? Is there any way for a lone student like me to do DE programming practice?
That’s great that you want to improve your programming abilities. You’ll need at least an intermediate-level programming ability. There’s a reason I don’t say advanced. That’s because Big Data won’t challenge your programming abilities; it will challenge your systems design and creation abilities.
When doing data engineering with Java, you’ll need a good, but not exhaustive knowledge of the language. The applies in other languages too. You’ll need to know things like lambdas and basic syntax. The closest to advanced syntax would be knowing what the
transient keyword means. Otherwise, it’s pretty simple syntactically. (I plan to write a more in-depth post about this one day.)
The real difficulty lies in the system design and creation. How do I accomplish this task? Which is the best technology to accomplish this? What are the subtle tradeoffs that make this system better than another? Notice that these aren’t questions about using a hashmap or a linked list. These are much bigger questions and require an in-depth understanding of distributed systems and the technologies themselves. Until you understand and can demonstrate proficiency in these technologies, it’ll very difficult to get a data engineering job.
On the infrastructure side, yes you’ll need need big infrastructure. Coming at it from a person who is learning Big Data, you’ll need to understand the Big Data infrastructure. You’ll need to understand the 10-30 different Big Data technologies. For your learning, you should start small on a VM (my course comes with this VM) and then run on large scale in the cloud. You want to be able to say that you’ve run algorithms on large datasets.
There are several way that a lone student can practice and learn the best ways of becoming a Data Engineer. This is why I titled my course Professional Data Engineering. I don’t just teach the technologies behind Big Data. I teach the things that professional Data Engineers do. These practices come from my extensive experience teaching, mentoring teams, and talking to other Data Engineers about what they do. These professional practices put you head and shoulders above others when it comes to jobs.
One the biggest suggestions I make in my Switching Careers book is that you should use an awesome personal project to not just show your skills, but allow you to practice what you’ve learned. This practice shows during an interview.
What do I have to offer at meetups? I go to meetups and see data science and data analysis types offering to work on machine learning projects, number crunching projects, etc. Is there a way for me to be useful to them – or maybe to smaller startups? I think I’m still asking the same question as before, i.e., whether data engineering is only useful in a place with a big infrastructure.
Providing value at meetups is important. It’s key to give as much, if not more, than you get from the meetups. Another key is to show that you’re making progress in your learning. If you come back every month and you’re still talking about the basics, you’ll never be perceived as a target for hiring. If you come back every month showing an increase in skill and expertise, companies will want to hire you.
The attendees of meetups are looking at you as a potential coworker. If you come across as someone who learns slowly or not at all, they won’t want you as a coworker. You will make their job more difficult. If you come across as a quick study and someone who’s getting better, they’ll want you as a coworker. You will make their lives easier.
A big part of this perception comes from your learning materials. If you’re going the cheap route with free or low priced materials, you will get nothing but beginner information. That will compound the perception that you can’t progress to advanced topics.
All kinds of companies need Data Engineers. These will range from startups to Fortune 100 companies. I know because I’ve taught and mentored companies of all sizes. The common factor is that these organizations have data problems and understand the value of data.
How can I round out my skillset? You mentioned that the course that I was interested in was kind of basic. I took the course anyways because it’s at least a start, but do you know of any resources that could better prepare me to at least do entry level work? It doesn’t have to be free, but anything more than a couple hundred bucks would probably be out of my range. I was thinking of something like this[cheap learning material on data warehousing], maybe. I see that you have (interesting-sounding courses)[http://www.bigdatainstitute.io/courses/] on your website, but they seem to be geared towards businesses. I suspect that they’re out of my price range, right?
Knowing the basics won’t get you a job. This is something I talk about in Switching Careers. These courses and videos waste your time and money. This causes you to get discouraged and think it isn’t possible to make the switch. The reality is that you’re going about your switch the wrong way.
The free and cheap route is where people’s data engineering dreams go to die. I’ve spent a good amount of time following up with people who tell me their going to somehow learn Big Data through YouTube or a super cheap online education site. These materials all cover the same basics. You won’t come out with an advanced or well rounded knowledge at all. I’m guessing employers are seeing this in your resumes or interviews and passing on you. This is a good way to get discouraged and stop pursuing your dream (thereby wasting all the time and effort you’ve put in so far).
There’s a good reason my materials cost several thousand dollars. It’s because they are the same world-class materials I use at Fortune 100 companies. They’re also have a 100% money back guarantee. They’re proven to take Software Developers and give them the professional data engineering skills to switch careers.
This mindset is why I wrote Switching Careers. I was tired of seeing people fail the same way and not understanding why they were failing.
You will need to actually invest a larger amount in yourself in order to make this switch. This is one of things that is extremely detremental to us as programmers. Until you start viewing yourself as your greatest asset, you will get stuck. My strong suggestion is to get serious about your learning and get the materials that are proven to make people data engineers.