Laptop on a table showing a graph of data

KPIs Every Data Team Should Have

Blog Summary: (AI Summaries by Summarizes)
  • Data teams face unique challenges and require specific KPIs to measure their performance and value creation.
  • Before setting KPIs, it's important to establish baseline numbers to track growth and maturity.
  • KPIs can be broken down to a project/product level.
  • Business-focused KPIs include percent of self-supported queries, business value created, increase in X business metric, and increase in value to the customer.
  • Data engineering-focused KPIs include data product usage, errors per unit/amount, perceived data quality, quantitative data quality, number of useful data products, and number of data model changes.

Data teams can be challenging. KPIs (Key Performance Indicators) are different from other teams. The team’s value creation and performance are distinct. I want to share some KPIs.

Before embarking on this, you’ll want to create baseline numbers for all metrics. This preparation will allow you and your team to track maturity and growth. Without knowing where you started, you won’t know how far you’ve come.

As you look at your projects or data products, you may need to break your KPIs down to a project/product level.



These are business-focused KPIs.

Percent of self-supported queries – the percentage of queries that the business could do entirely by themselves. Data teams should be focusing on offloading queries and providing infrastructure for the business to run queries themselves.

Business value created – how much business value is created by the data products. The actual business value is highly business-dependent.

Increase in X business metric – the increase in a metric that can be directly tied to using data products. This could include metrics such as conversion or average sale price.

Increase in value to the customer – the increase in metric that can be directly seen or experienced by the customer. This could include personalization or service/logistical speed.


Data Engineering-Focused

Some KPIs are more focused on the data engineering side.

Data product usage – how much other teams or programs use your data product. The manifestation of becoming a data-driven organization will increase this number.

Errors per unit/amount – the number of errors in your data product. This shows how reliable and correct the data products are.

Perceived data quality – how the users of the data products perceive the data quality. This would be a subjective value.

Quantitative data quality – shows if the contents of the data product are of high quality. For example, does the entry have the correct foreign key or is it within the constraints?

The number of useful data products – the number of data products in use by the rest of the organization. Some organizations will have many data products, but an unknown percentage of those data products are being used.

The number of data model changes – the number of times the data model changes over time. Too many data model changes may show a missing person or process for data models.


The operations team will need to create a set of KPIs around operational excellence.

Outages – the number or percentage of time the data products are inaccessible. This would be a total outage rather than a deterioration of service.

Framework and service uptime – a framework or service loss may not represent a total outage. The operations team will want to track their framework and service uptimes.

Percent of automation of tasks – the percentage of tasks that are automated. Operations should increase the number of automated functions for reproducibility and consistency.

Incidents related to data quality – the number of incidents related to a data quality issue. For example, how many outages could be traced back to a data quality issue?


Data Science-Focused

The data science team needs to create specific KPIs around AI and ML.

Reduction of humans in the loop – a reduction in the time humans spend that is now done by AI. For example, an inventory decision done entirely by a human is now augmented with AI to reduce the human’s overall time spent.

Ease of getting data – A subjective view of how difficult it is for a data scientist to access the data product exposed with the proper infrastructure. This can be quantified as how much time data scientists spend on data engineering.

Ease of deploying models is a subjective measurement of how long or effort it takes to deploy a model into production. For example, this could be the mean time to train and deploy a model for the data science team.


Focusing Data Teams

KPIs help focus data teams on what they need to improve or show velocity. By choosing the right metrics and goals, management can show progress. Through focus, data teams can improve in all aspects of performance.





Related Posts

The Data Discovery Team

Blog Summary: (AI Summaries by Summarizes) The concept of a “data discovery team” is introduced, which focuses on searching for data in an enterprise data

Black and white photo of three corporate people discussing with a view of the city's buildings

Current 2023 Announcements

Blog Summary: (AI Summaries by Summarizes) Confluent’s Current Conference featured several announcements that are important for both technologists and investors. Confluent has two existing moats

zoomed in line graph photo

Data Teams Survey 2023 Follow-Up

Blog Summary: (AI Summaries by Summarizes) Many companies, regardless of size, are using data mesh as a methodology. Smaller companies may not necessarily need a

Laptop on a table showing a graph of data

Data Teams Survey 2023 Results

Blog Summary: (AI Summaries by Summarizes) A survey was conducted between January 24, 2023, and February 28, 2023, to gather data for the book “Data

Black and white photo of three corporate people discussing with a view of the city's buildings

Analysis of Confluent Buying Immerok

Blog Summary: (AI Summaries by Summarizes) Confluent has announced the acquisition of Immerok, which represents a significant shift in strategy for Confluent. The future of

Tall modern buildings with the view of the ocean's horizon

Brief History of Data Engineering

Blog Summary: (AI Summaries by Summarizes) Google created MapReduce and GFS in 2004 for scalable systems. Apache Hadoop was created in 2005 by Doug Cutting