Osian Llwyd Jones

Stuart's Approach to Data Mesh: Leveraging Discovery Capabilities

Unlocking Data Discovery and Collaboration: How Stuart and CastorDoc are Paving the Way for a Streamlined Data Ecosystem

Stuart's Approach to Data Mesh: Leveraging Discovery Capabilities

This customer story was contributed by Osian Llwyd Jones, the Head of Product, Data Platform at Stuart. Owned by GeoPost, Stuart is a leading B2B logistics platform in Europe, Operating in more than 100 cities with over 3,500 clients and counting.

Introduction

As the Head of Product for our data platform, I oversee a team of 15 professionals who are dedicated to building a robust data platform that is informed by a product-centric approach. Our overarching goal is to enable every member of the Stuart team to make data-driven decisions in the fastest possible time frame.

To achieve this objective, we focus on empowering our ~100 analysts with the tools and resources they need to generate timely insights that drive our business forward. We understand that the data experience of our analysts has a ripple effect on every aspect of our business.

In 2022, we partnered with Castor to help us improve the Data Experience across the whole organization. Around 500 users – more than half of the company – interacts with the data platform on a monthly basis. We wanted to ensure these people had a smooth and easy experience.

Castor played a crucial role in assisting Stuart with two key aspects of its business: data discovery and collaboration. As we continue our journey towards a data mesh infrastructure, we intend to leverage Castor's capabilities to facilitate our transition.

I - The Challenge: Data Discovery at Stuart

“We conducted an internal survey in 2021, which revealed that data discovery was the top concern among our users” - Osian Llwyd Jones, Head of Product, Data Platform, Stuart

Our data platform currently hosts thousands of datasets in Redshift and dashboards in Tableau, and we aim to maintain this growth trajectory. We strive to encourage stakeholders to create more data and utilize our platform to its full potential.

As our data platform continues to grow, finding relevant data and trusting its quality has become a challenge for stakeholders at Stuart. To gain insight into this pain point, we conducted an internal survey in 2021 which revealed that data discovery was the top concern among our users in terms of their overall data experience.

Comments from the survey included, "We’ve onboarded a lot of new users recently, and the lack of documentation is quite painful," and "There is not enough documentation on the tables, so I mostly need to ask teammates what is the meaning of columns." As a data platform team, we found this feedback both surprising and concerning.

The lack of understanding from stakeholders also meant there was an influx of recurring questions to the data team, with users asking where to find assets in the warehouse or the documentation for specific tables. Slack channels became cluttered with inquiries such as "I’m trying to merge data X with data Y. I know about the Y table, but I don’t know what I can join it to. Can someone help?”

This issue became so pressing that stakeholders began creating their own data catalog in Tableau to navigate information better. Recognizing the need for a solution, we realized that the stakeholders' self-initiated catalog was the most compelling evidence of the need to improve our data discovery process.

II - Selecting and deploying the right data catalog solution

“Documentation is a painful process, we thus think it should be crowdsourced as much as possible, just like Wikipedia." - Osian Llwyd Jones, Head of Product, Data Platform, Stuart

We had some guiding principles for our data catalog selection, and they still hold true today. First, a data catalog needs to foster collaboration and incentivize users to contribute to the documentation process.

Since documenting data can be a tedious and unappealing task, it should not fall solely on one team's shoulders. Instead, we think documentation should be crowdsourced as much as possible, just like Wikipedia.

This is why we sought a tool that prioritized collaboration and encouraged users to contribute. We found Castor to be a perfect fit due to its collaborative design, which was a crucial factor in our decision-making process.

Our search for a data catalog tool took several other factors into account. Alongside the core capabilities of aiding users to find, understand, and use data, we sought a solution that was compatible with our tech stack and required minimal engineering intervention. From a product manager’s perspective, I also needed our tool to be user-friendly and easy to navigate.

Previously, I had worked with other catalogs such as Amundsen but found it difficult to document effectively with it due to its lack of collaboration features. In contrast, Castor's collaborative capabilities stood out as a major selling point, making it a clear choice for our team.

The implementation process was straightforward. Stakeholders had been yearning for a data catalog, and Castor was easy to use right away. Users were easily onboarded, without any need for convincing.

In order to encourage adoption, we chose to lead by example. As a centralized data team at the time, we prioritized documenting our core tables to an exceptional degree. It was crucial for us to establish the right standard if we were to proceed with crowdsourced documentation.

To further increase usage, we actively redirected users away from Confluence and towards Castor instead. We also hosted Castor demos and roadshows to educate users on the benefits of the tool and how it could address the pain points they had previously faced.

Overall, the smooth adoption of Castor was due to the tool's ability to fill a longstanding need in our organization, without replacing any existing tools or workflows.

III. Results & Roadmap

“If we want to grow in a distributed model, we need documentation and good data understanding”- Osian Llwyd Jones, Head of Product, Data Platform, Stuart

Since implementing Castor as our data catalog solution, Stuart has experienced significant improvements in data discovery.

We've been able to streamline and standardize our data documentation, resulting in improved data discovery and usage for our stakeholders. We've also seen a significant reduction in data-related inquiries, which I would estimate at 50% decrease over the past year.

We track this progress using slack workflows that allow users to submit inquiries via an easy-to-use form, which enables us to monitor the impact of Castor and other data platform enhancements on a month-to-month basis. With 120 stable Monthly-Active-Users of Castor, we can attribute a significant portion of this progress to the platform's collaborative and user-friendly design.

Looking ahead, we aim to leverage Castor's capabilities to facilitate a shift towards a more distributed data infrastructure, in line with the data mesh model. Currently, our analytics team has too much ownership over the data platform, creating a perception that they will take care of everything. To address this issue, we plan to make our data platform fully self-service, which requires comprehensive documentation, data ownership, and clear points of contact.

In a distributed data infrastructure, each team or department would have greater control and ownership over their respective data assets, allowing them to use the data more effectively and efficiently. To support this transition, we need to ensure that all stakeholders have a clear understanding of the data available to them and how to use it.

By working closely with Castor, we believe that we can achieve these objectives and maximize the value of our data

Subscribe to the newsletter

About us

We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.

At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.

Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.

Want to check it out? Reach out to us and we will show you a demo.

About

Contactez-nous pour en savoir plus

Découvrez ce que les utilisateurs aiment chez CastorDoc
Un outil fantastique pour la découverte de données et la documentation

« J'aime l'interface facile à utiliser et la rapidité avec laquelle vous trouvez les actifs pertinents que vous recherchez dans votre base de données. J'apprécie également beaucoup le score attribué à chaque tableau, qui vous permet de hiérarchiser les résultats de vos requêtes en fonction de la fréquence d'utilisation de certaines données. » - Michal P., Head of Data.