Is your AI product really working? How to develop the right metric system

Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more

In my first term as a machine learning (ML) Product Manager, a simple question motivates the tasks and emotional debate in the leaders: how do we know if this product is actually working? The product in question that I completed both internal and external customers. The model enabled internal teams to identify top issues faced by our customers so that they could prefer the correct set of experiences to fix customer issues. With such a complex web of internal and external customers, choosing the right matrix to catch the impact of the product was important to move towards success.

The track is not tracking an aircraft without any instructions from air traffic control, whether your product is working well or not. At all there is no way that you can make informed decisions for your customer, without knowing what is right or wrong. Additionally, if you do not actively define the matrix, your team will identify their own back-up matrix. The risk of many tastes of a ‘accuracy’ or ‘quality’ metric is that everyone will develop their own version, making a landscape where you can’t work towards the same result.

For example, when I reviewed my annual target and my engineering team reviewed, there was an immediate response: “But this is a commercial metric, we already track and miss accuracy.”

First of all, recognize what you want to know about your AI product

Once you get down to the task of define the matrix for your product – where to start? In my experience, the complexity of the operation of an ML product with many customers also translates to define matrix for models. What do I use to measure whether a model is working well? Measuring the results of internal teams to prefer launch based on our model will not be enough early; Measuring whether the solution adopted by our model to the recommended customer can take the risk of concluding us from a very broad adopted metric (what if the customer does not adopt the solution because they just wanted to reach a support agent?).

For the era of large language models (LLMS), fast forward-where we do not have a single output from only one mL model, we also have lessons as an output answer, pictures and music. The dimensions of the product that require metrics now increases rapidly – format, customer, type … list moves forward.

Over all my products, when I try to come up with the metrics, my first step is that I want to know about some important questions about its impact on customers. Identifying the correct set of questions makes it easy to identify the correct set of matrix. Here are a few examples:

Did the customer get an output? → Metric for coverage
How long did the product take to provide an output? → Metric for delay
Did the user like the output? → Matrix for customer response, customer adoption and retention

Once you identify your major questions, the next step is to identify a set of sub-pranks for ‘input’ and ‘output’ signals. Output metrics indicators, where you can measure the event already occurring. Input matrix and major indicators can be used to identify trends or to predict results. See below for methods of adding leggings to the above questions and adding correct sub-tenders to leading indicators. There is no need to be a leading/leggings indicator in all questions.

Did the customer get an output? → coverage
How long did the product take to provide an output? → delay
Did the user like the output? → Customer response, customer adoption and retention
1. Did the user indicate that the output is correct/wrong? (Output)
2. Was the output good/fair? (Input)

The third and final step is to identify the method to collect the metrics. Most metrics are collected by new instruments through data engineering. However, in some examples (eg question 3 above) especially for ML -based products, you have an option for manual or automatic evaluation that assesses models output. Although it is always best to develop automated evaluation, starting with manual evaluation for “” output was good/fair “and making a rubric for good, fair and good definitions will help you laying groundwork for a rigid and tested automated evaluation process.

Use of Examples: AI Search, List Description

The above outline can be applied to any ML-based product to identify the list of primary matrix for your product. Let’s search as an example.

Question	Metrics	Metric nature
Did the customer get an output? → coverage	% Search session with search results shown to customer	Production
How long did the product take to provide an output? → delay	Time taken to display search results for the user	Production
Did the user like the output? → Customer response, customer adoption and retention Did the user indicate that the output is correct/wrong? (Output) Was the output good/fair? (Input)	% Of the search sessions with ‘Thums Up’ response on the customer’s search results or the click from the customer For each search word, % of the search results marked as ‘good/fair’ per quality rubric	Production Input

How about a product to generate details for a listing (whether it is a menu item in the doardash or a product listing on Amazon)?

Question	Metrics	Metric nature
Did the customer get an output? → coverage	% Listing with description generated	Production
How long did the product take to provide an output? → delay	Time taken to generate details to the user	Production
Did the user like the output? → Customer response, customer adoption and retention Did the user indicate that the output is correct/wrong? (Output) Was the output good/fair? (Input)	With the descriptions generated, the % of the listing that requires editing from the technical material team/seller/customer % Of the list details marked as ‘good/fair’, per quality rubric	Production Input

The above mentioned approach is extending to many mL-based products. I hope this structure helps you define the correct set of matrix for your ML model.

Sharanya Rao is a group product manager Yours,

Daily insights on business use cases with VB daily

If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

Read our privacy policy

Thanks for membership. See more VB newsletters here.

There was an error.

What's Hot

Microsoft Word will save your files on cloud by default

Why this midrange Lenovo laptop is what I suggest to most people

Tamperedchef infostealer distributed through fraud PDF editor

How is the landmark wrongly working after the wrong death trial

Why do I recommend $ 700 Lenovo laptop to both college students and working professionals

IT system supplier Cybrants affects 200 municipalities in Sweden

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

Our Picks