zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)
  • Many companies, regardless of size, are using data mesh as a methodology.
  • Smaller companies may not necessarily need a data mesh approach and may not have enough resources to implement it properly.
  • There are no magic bullets or single formulas to choose the right methodology for data teams.
  • There are a surprising number of homegrown methodologies being used by data teams.
  • Homegrown methodologies often involve a mix of different methodologies based on the team's needs.

The results and analysis from my 2023 Data Teams Survey left a few open questions. Let’s revisit these questions with some answers.

Methodologies and Size of Company

A bar chart displaying different kinds of methodologies broken down by size of company using them

Figure 1 – Methodologies Broken Down By Size of Company Using Them

We see a few commonalities across different company sizes, as shown in Figure 1. One striking commonality is that so many companies are using data mesh.

I find it difficult to believe that these smaller companies have issues that require a data mesh approach. I’m also curious if the small companies have enough people to implement the data mesh approach properly.

As you choose a methodology, remember that there are no magic bullets or a single formula to get it right.

As you choose a methodology, remember that there are no magic bullets or a single formula to get it right. You can’t just approach this like moving your shopping cart through the store and pulling things off the shelf. Each of these methodologies has essential pros and cons to consider.

Homegrown Methodologies

There were a surprising number of homegrown methodologies that people reported. To better understand what people meant by their homegrown methodology, I contacted people to get more information.

Here are a few anonymous quotes:

“I would say it is a mix of Kanban/Agile/Scrum. The reason being as we have a diverse team with different needs, so we scramble and adjust to be as efficient as we can. I guess you can say we pick the best parts of each methodology whenever we need.”

“For us, the issue is that we have:

  • a fully distributed team
  • minimal schedule of meetings
  • our main access point for deliverables is through GitHub private repos => Azure DevOps
  • customer has domain expertise but little exposure to software engineering
  • customer is unfamiliar with using GitHub

In the past two years, we’ve had only a handful of people in the [remote] office who actually interacted with GitHub issues. We’re not allowed to see process beyond the Azure boundary, and in some cases it involves transfer by hand of source files from the GitHub private repos into an internal GitLab repo which we’re not allowed to see. But we’ve had to evolve a homegrown process that fits both teams.”

I put homegrown because our methodology is a mix of several of those. Although I work at a company now that has a very immature data process, my experience at other companies at a more mature stage was the same. It seems like the ideas for the major methodologies get adopted in pieces (either what fits or what pieces a manager latches on to) but not fully implemented.

Getting teams to pick one and fully implement it is a really tough task, even when building everything from scratch.

A good example is how many companies try to use agile methodologies, but end up doing it in a waterfall fashion. That isn’t always a bad thing, but getting teams to pick one and fully implement it is a really tough task, even when building everything from scratch. In the software engineering space this is hard, in the data world it seems impossible.

As I am building out the data team in my current role, and have influence over the whole process, I am trying to move towards a dataops methodology, but with limited resources it will likely take a long time to get there. Data engineering is run under the software engineering head, but our projects don’t fit nicely into their sprints. Many of the dataops functions were being handled by the devops team before me, which led to solutions and methods that weren’t optimal for the volume of data we are working with. So with a very small data team it is mostly just trying to organize everything and work on the biggest problems first, while attempting to move toward a more coherent methodology.”

“The workflow was based on a few scrum ceremonies, etc. but without any designated roles, predomimately due to a lack of management prescribed methodology. I have since left the organization which ironically, in addition to the analytics practice I was part of, provided consulting services for Agile implementations. I wasn’t aware of any plans to institute any particular flavor of project management.”

I think these comments highlight a few clear holes in our project management frameworks. We don’t have a ready-made project management framework that slides right into data science or data engineering. I believe that we’ll continue to research what works best.

There isn’t a one-size-fits-all solution for data teams.

As you can see, there isn’t a one-size-fits-all solution for data teams. If you’d like help starting or fixing your team, please contact me.

Related Posts

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes)Many companies, regardless of size, are using data mesh as a methodology.Smaller companies may not necessarily need a data mesh

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes)A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data Teams”

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes)Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent.The future of primarily ksqlDB

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes)Google created MapReduce and GFS in 2004 for scalable systems.Apache Hadoop was created in 2005 by Doug Cutting based on

Big Data Institute horizontal logo

Independent Anniversary

Blog Summary: (AI Summaries by Summarizes)The author founded Big Data Institute eight years ago as an independent, big data consulting company.Independence allows for an unbiased