Small data engineering teams require different tactics. Much of my writing is geared towards larger companies and teams. How should a startup or small data engineering team in a big company be set up and work? What, if anything, should be done different?
Your First Data Engineer
Your first data engineering hire is a crucial decision. They’re going to be the one that gives the team its starting direction. They’re making many of the initial technology decisions. Some of these you can change later and others will be quite difficult to change.
As you go to hire more engineers, your first hire will be the one interviewing and evaluating the rest of the team. Conversely, your interviewee’s will be evaluating the technology decisions and directions you’ve undertaken. That will either repel or bring in other competent data engineers. Weak engineers will often hire other weak engineers.
The Effects of a Bad Hire
The obvious biggest differences is the sheer size. A single wrong hire can make a disproportionate problem on the team. For example, if a team has 3 total people and 1 of them isn’t competent, you’re only getting 66% of the team’s productivity. In my experience, the decrease in productivity is more like 50-60% because the incompetent person actually ties up the resources of the other people. In Big Data, things are more complicated and the person can’t just google the answers to every question.
Hiring competent people is crucial early on. You may be hitting an ability gap and need to make some changes.
Some companies try to look internally at the people they already have. These companies will have their data warehouse team, DBA, or other SQL-focused person start their Big Data initiatives. In Data Engineering Teams, I talk more about why this is a bad idea. With a small team, there is a more profound effect. The lack of various skills and no other team members to rely on will cause either an outright failure or a severe underperforming Big Data initiative. A SQL-focused person is best hired in a larger data engineering team.
A better bet is to look internally at your software engineers. Look for people with experience multi-threading They’ll have the required programming and perhaps the distributed systems experience needed.
No matter what, these people won’t have the skills with Big Data technologies. You’ll need to give them the resources and time to learn the technologies. Too often, small teams are in a hurry and skip the necessary skill acquisition. Skipping this step makes teams get stuck.
Too Many Junior People
Some companies will try solve their small team woes by throwing many junior engineers at the problem. A group of 10 novices doesn’t average out to 1 expert; a group of 10 novices averages out to 1 novice will the same levels of knowledge and skill. Their skills overlap rather than compliment each other. This lack of experience is a killer to productivity on data engineering teams.
The issue with novice designs in Big Data are the magnitude of the mistakes. A bad design doesn’t cost a few days to rewrite. They can take months to fix and rectify.
I’ve worked with junior designs and code. A junior person can understand what, but not why or if the design is correct. More often than not, there are problems with the designs and you might not see them until you’re in production. Junior engineers often copy other peoples’ designs and don’t understand the differences or why it might not work for their use case.
In Data Engineering Teams, I talk about what to do when you lack a veteran on the team. A project veteran is crucial to a data engineering team’s success.
When you only have a few people, you need to make every person count. As soon as you go to production, you’ll have to figure out who should handle operations. I highly recommend small teams use managed services on the cloud. Many of the popular open source technologies have managed cloud products.
Doing this will make it so you don’t have to hire an operations person. The majority of your operational load can be offloaded on to the managed service. Hire a person with a DevOps background so they can both the development and operational tasks.
Sometimes small teams or companies will eschew outside help. A small team needs to take advantage of whatever resources are available. Getting outside help is a great way to accelerate your team’s productivity.
You can leverage outside help to:
- Provide architecture reviews
- Interviewing and hiring data engineers
- Coding the data pipelines
The Chicken-And-Egg Problem of Small Teams
What do you do when you have a data scientist, but not a data engineer? Or what do you do when you have a data engineer, but not a data scientist? Remember that a data engineer has a difficult, if not impossible, task of trying to be a data scientist. Likewise, a data scientist can’t really do the job of a data engineer.
You’ll have to understand each person’s limitations and decide what they can do and what’s well within their limitations. Otherwise, you’ll be having to do significant rewrites.
Some companies try to find that person with both data engineering and data science skills. These people do exist, but they’re few and far between. They’re also expensive. My suggestion is to leverage outside help to fill these gaps until you can hire someone.
Hiring Your First Data Engineer
All of this advice comes down to hiring your first data engineer. Ideally, you’d hire someone with previous experience as a data engineer. Other desirable traits would be:
- Has put a Big Data project into production
- Was a senior software engineer prior to becoming a data engineer
- Has an established pedigree from another competent company
- Has experience in the same industry or domain as your company to leverage direct previous experience
When interviewing your first hire, you might not be able to identify a good candidate from a bad one on the technical side. You should focus on demonstrable skills during the technical interview.
You also need to get clear about what you’re trying to hire. A data scientist is not a data engineer. Sometimes startups will try to find someone who is both. These people do exist, but are few and far between. In the cases where you don’t have a data scientist already, you should hire a data engineer first. You can buy yourself time by getting started on the data engineering required so the data scientist has data products and data infrastructure to use once they’re hired.
Every team has to start with their first hire. Getting your first hire right is crucial. It sets the pace for the rest of your project.