For more practical perspectives to help leaders navigate the challenges and opportunities of building a growth company subscribe to The Ascent 

The term data catalog is often used synonymously with data governance, master data management or data stewardship. A data catalog is, at heart, a list of all of the data sources used by an organization, the tables of data within those sources and the columns of attributes that make up those tables. Along with these listings, data catalogs can contain additional data (metadata) such as data types, typical values, how the data should be used for analysis, and any derived tables that aggregate or combine multiple other pieces of data.

In the past, data catalogues have been a necessary tool for regulatory compliance or as a check on development teams. More recently, some data catalog software offerings have added intelligence on common queries performed on the data, dashboards that use it, and machine learning models that depend on it – by inspecting code rather than relying solely on humans to type this in directly. We believe several trends are converging to escalate the importance of data catalogs, making this a trend to watch in 2022 and beyond.

1. Data Sources are Proliferating

Organizations are analyzing data from a wider variety of sources than ever. This is fueled by the increasing use of SaaS applications across business functions and accelerated by the ease of moving data from these apps into data warehouses or data lakes using SaaS ETL tools. This new paradigm allows companies to use best-of-breed tools for demand generation, customer marketing, salesforce enablement and customer support, and still be able to assemble all of the touchpoints for a given customer into a unified picture of their lifetime customer journey.




2. Machine Learning has Matured

Machine learning, from algorithms to engineering practices, has matured in a way that allows data cataloging tools to include ML capabilities that add useful context to, and recommendations within, an analyst’s workflow. Previous generations of tools often felt like completely different systems, needed specialized knowledge to use, or provided little more than quickly out-of-date documentation. As a result, data catalogues have often failed to gain widespread usage by the people who could benefit from them most. The newer generation of tools promises to meet the user where they are already working, for example in the SQL query interface, and be as easy (and informative) as Slacking a coworker.



3. Engineering and Data Talent are in Demand

Demand for talent and the widely covered impact of the “great recession” reinforces the importance of data catalogs in facilitating knowledge transfer. Data employees tend to develop deep tribal knowledge, often in opaque but critical pieces of the data foundation. Nuances related to when an upstream process changed, the correct filters to apply to obtain financial numbers, bespoke code that sits behind management dashboards or production models: there are many reasons these employees (and their knowledge) are valued. In our experience, acquiring this knowledge is often a slow process for new employees. Knowledge transfer can be tedious for tenured folks. And with the seemingly insatiable demand for new data and engineering talent and the well-documented volume of employee turnover across industries, your team may run the risk of losing unknown amounts of specialized knowledge for good. Data catalogs can help solve some of this pain by storing in code and presenting in convenient interfaces what has previously been locked away in the minds of a few. They promise to accelerate time to impact by helping new team members find the data they need faster and more accurately and reduce the some of the frustrations of working with data that can themselves contribute to employee turnover.

Data catalogs are difficult to “get right”, but we believe they offer a straightforward promise: to apply data to solve difficult data problems. As you prepare your team to tackle challenges in the year ahead, we encourage you to consider their strategic potential.


Cathy Tanimura is Vice President of Analytics & Data Science at Summit Partners. She works closely with the Summit team and our portfolio companies to help create and execute strategies that effectively use data and analytics to drive better decisions and build better products. She is the author of SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights. Prior to Summit, Cathy worked at Strava, where she built an analytics and data science team focused on product, marketing and business development. Previously, she led analytics teams at Okta and Zynga.

Related Content

Additional frameworks and perspective from Summit Partners

Subscribe to The Ascent

The Ascent Newsletter


The Ascent is our quarterly newsletter offering practical perspectives to help leaders navigate the challenges and opportunities of building a growth company. Sign-up to receive the latest issue and stay up to date on the latest perspectives, stories and activity from across the Summit Partners Network.