If AI is The Far West - Who’s the Sheriff?
How to finally deliver on AI’s promises of a reliable AI Assistant
Introduction
I’d like to discuss the undelivered promise of AI: the AI-powered data assistant.
When ChatGPT was released - many companies were quick to promise data assistants or data “co-pilots” that would finally empower us all with data and help us reach the self-service dream.
Yet - the revolution has been slow to materialize. At least in the data world. Benn Stancil's observations resonate here: creating chatbots is one thing - but crafting ones that genuinely transform how we interact with data is another. Aligning all the puzzle pieces to create a coherent, functional AI assistant is far more complex than we initially thought.
So, why is the gap so large between the data assistant promise and the reality that hits us: currently, there is no viable data assistant on the market.
This is because it’s not as simple as plugging a chatbot into your data warehouse. The data landscape of many organizations is chaotic - and letting AI feed over a chaotic environment only multiplies the chaos.
But the solution is straightforward. Just like a town needs a good sheriff, a data assistant needs good governance. So, hats on, and let’s dive in 🤠
In this piece, we’ll explore the reasons behind the struggle to create reliable data assistants and outline steps to make this dream a reality. Naturally, we’ll focus on the role of data governance in this initiative.
I - AI and The Failed Promise of Data Assistants
Before we delve into the specifics of crafting a reliable AI data assistant, let’s first clarify the concept.
What capabilities would make a data assistant genuinely valuable? If we were to sketch the ideal profile of an AI assistant, which qualities would it possess?
This section is a brief overview of what we are looking for in a data assistant, and why we have failed to deliver on the promise yet.
A - The Case for Data Assistants
Most of the AI chatbots that have sprouted right after the ChatGPT release seem more about hopping on the AI bandwagon than delivering genuine value.
When it comes to data, a reliable assistant would bring a solution to a longstanding issue: data self-service.
Most companies aim to empower more stakeholders to autonomously use data while also maintaining control over this data.
AI is well suited to answer this promise because it can guide stakeholders in their data initiatives.
An AI assistant has the potential to remove the technical barrier to use data. Data analysts are using SQL to query data. AI has the potential to explain SQL, assist stakeholders in writing SQL, and even teach SQL.
As a marketer who (sometimes) tries to work with data, I dream of an assistant that could answer questions such as:
- Who are our top 50 users?
- Which blog post resulted in the highest number of conversions last year?
- What is the quality of leads generated from conferences, and what’s their conversion rate?
If I had an AI guide to point me in the right direction or even provide answers directly, it would be very valuable. It would save me from bothering five different team members for insights.
It would make my marketing efforts more precise. It would allow me to plan better at the quarter's start and execute my strategy more effectively. Chances are, it would help folks in other departments, too.
B - Why has the dream not materialized yet?
“When I ask “How many new accounts did we have in Europe last quarter?,” the bot has to determine what I mean by accounts. […] It has to make a bunch of assumptions about what “new” means. It has to do all of this on messy data, often spread across thousands of constantly-changing tables that are undocumented, poorly named, and full of ambiguous and seemingly duplicative columns.” Benn Stancil
As illustrated by the quote above, we run into a lot of obstacles in our attempt to bring data assistants to life.
Two key obstacles prevent AI assistants from bringing value to the table. Firstly, these assistants lack business context—definitions, calculation formulas, and key performance indicators. Without this, they struggle to interpret and act on our queries accurately.
Secondly, they're cut off from essential metadata. This metadata would enable them to track data provenance, understand how data sets are connected, who owns what, and so on.
Why don’t they have access to this knowledge? Simply because organizations don't prioritize developing, updating, or making this information easily accessible.
So you've guessed it - the dream of reliable AI Assistants can be made possible through robust data governance, as we’ll explore next.
II - Bring on the Sheriff
This part focuses on what it takes to build a data assistant that creates value instead of being another “build a quick chatbot over our data initiative” that ends up spreading poor information.
Three steps are needed to develop a data assistant that creates value. Initially, you must develop a thorough documentation of business knowledge within your organization. Secondly, your data must be augmented with metadata. Finally, these two aspects should be integrated within a robust governance framework. I recommend approaching this process progressively because each initiative is already valuable in itself.
A- Cultivating Business knowledge
Dreaming of an AI that answers all your data questions is one thing; realizing the dream is another - particularly if your organization hasn't defined its business metrics.
Establishing these metrics is critical—it's essentially how you introduce your AI to the language of your business. Without this understanding, the AI cannot interpret your questions nor give you accurate answers aligned with your business.
Data is inextricably linked to the essential business concepts that form the basis of any organization, such as client, product, invoice, payment, and so on. Having a clear idea of these concepts and how they interact with each other is essential.
For AI to answer "Which article got the most conversions last year?", needs clear definitions. What counts as a conversion? A demo sign-up or an email submission? Without clearly set metrics, AI won't understand what to look for.
So, start by defining your metrics and KPIs clearly. This does two things: First, it lays the groundwork for building a fully operational data assistant. Secondly, and just as importantly, even a basic AI built on this foundation brings immediate value. It becomes a tool for everyone in your organization to access unified, accurate information on key business metrics.
Imagine anyone in your organization could query a chatbot with questions like:
- How do we define MMR?
- What’s the formula we use to calculate ARR?
- What are the criteria for a lead to be considered 'qualified'?
A lot of organizations still suffer from siloed knowledge. It is too often the case for two different departments to calculate the same metric differently. If you build a single source of truth for metrics and KPIs, and build an AI chatbot on top of it, you're making sure everyone is on the same page. This kind of consistency is gold for any company.
You can find our top tips on documenting business knowledge in this piece.
B - Metadata is king
“In this coming era of AI and LLMs, metadata quality will be as important as data quality. LLM applications need rich, high quality metadata in order to use data. It is the case that they can't reliably use data without metadata.” David Jayatillake
Business knowledge is one part of the puzzle. The other essential component is metadata, which provides the necessary context around data. Allowing AI to process and share uncontextualized data throughout an organization leads to widespread misinterpretation and confusion.
Data quality has been a point of concern for a while now - Yet, for those aiming to craft a trustworthy data assistant, the emphasis on metadata quality should come to the center stage.
When AI knows the full story behind the data, it can explain its reasoning. For example, it can tell you why it chose a certain table for analysis and say "I used this table because it has been certified and was used by 50 people in the company last week."
Keeping all this detailed and accurate metadata easy to find means AI can be more transparent, which helps users trust the AI's decisions but also the data it uses.
If enriching your data with metadata is the second stepping stone toward a reliable data assistant, it can also be an end goal in itself.
An AI built on a foundation of comprehensive metadata brings tremendous value. It becomes a tool for everyone in your organization to be guided around the company’s data landscape
Imagine anyone in your organization could query a chatbot with questions like:
- What are the known data quality issues or gaps in this dataset?
- What process was used to collect this data?
- Which datasets are related to this one?
- How frequently is this data updated, and what's the next scheduled update?
- Are there restrictions on using this data for certain types of analysis?
An AI that leverages metadata to guide stakeholders in their research is incredibly valuable. Many companies suffer from a lack of trust in the data, and discoverability issues. But if you have an AI that can guide people to the right, trustworthy data in a transparent manner, you're building a culture of trust in the data.
C- All of it - in the right place
Let’s briefly recap. In this piece, we covered the fact that a functional data assistant needs:
- Business knowledge: This includes a complete understanding of the business context.
- Metadata: This is the context around the data, detailing its origin, structure, and characteristics.
However, these elements can help the data assistant deliver value only if they are leveraged simultaneously. An assistant that can only access business knowledge is interesting, but it cannot provide data answers. Conversely, an assistant that can only access metadata is helpless when it comes to answering business questions.
It is thus only when used simultaneously that these two elements unlock their full potential, making your AI system a real vehicle for building self-service in the organization.
When both are used simultaneously your AI can correlate business context with metadata. it makes it adaptable to changes in the business definitions or shifts in the data architecture. Its output is always based on the updated business context & metadata.
Let’s come back to my wish of asking the AI assistant who are the top 50 users of our product. If the AI assistant can simultaneously utilize both business knowledge and metadata, it has the capability to:
- Interpret the latest definition of a 'customer' based on contract value, loyalty, or other criteria.
- Use metadata to pinpoint the most relevant, trustworthy compliant source of information to answer this question.
- Adjust its methodologies in response to updates in business logic or the creation of new data assets.
So - for a data assistant to provide reliable answers, the knowledge and metadata elements need to be integrated within a strong governance framework so that they can be leveraged simultaneously by the AI.
This approach highlights the symbiotic relationship between AI and data governance. The governance pillar, including business knowledge and metadata management, ensures that the Data Assistant can be trusted. Conversely, the data assistant maximizes the impact of the governance effort by empowering all stakeholders to leverage data.
Conclusion
Although a lot of initiatives have been taken to leverage AI for self-service, there is no reliable data assistant to this date.
To build a functional data assistant - companies need to leverage their business knowledge and their metadata simultaneously. This gives AI the right context to provide accurate, trustworthy answers to stakeholders of all technical levels.
This means that your AI initiative should be rooted in a strong data governance framework - whereby business knowledge is documented and accessible, and metadata is meticulously managed.
At CastorDoc, we have been helping companies build the most comprehensive knowledge repository and cultivate their metadata in an automated manner. Now that we are sitting on a gold mine of business knowledge and metadata - we have build the CastorDoc assistant on top of it, helping companies bring about a culture of self-service while keeping some levels of control on the data. If this sounds like something you would like to explore, get in touch with the team.
You might also like
Get in Touch to Learn More
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data