Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more
In my first term as a machine learning (ML) Product Manager, a simple question motivates the tasks and emotional debate in the leaders: how do we know if this product is actually working? The product in question that I completed both internal and external customers. The model enabled internal teams to identify top issues faced by our customers so that they could prefer the correct set of experiences to fix customer issues. With such a complex web of internal and external customers, choosing the right matrix to catch the impact of the product was important to move towards success.
The track is not tracking an aircraft without any instructions from air traffic control, whether your product is working well or not. At all there is no way that you can make informed decisions for your customer, without knowing what is right or wrong. Additionally, if you do not actively define the matrix, your team will identify their own back-up matrix. The risk of many tastes of a ‘accuracy’ or ‘quality’ metric is that everyone will develop their own version, making a landscape where you can’t work towards the same result.
For example, when I reviewed my annual target and my engineering team reviewed, there was an immediate response: “But this is a commercial metric, we already track and miss accuracy.”
First of all, recognize what you want to know about your AI product
Once you get down to the task of define the matrix for your product – where to start? In my experience, the complexity of the operation of an ML product with many customers also translates to define matrix for models. What do I use to measure whether a model is working well? Measuring the results of internal teams to prefer launch based on our model will not be enough early; Measuring whether the solution adopted by our model to the recommended customer can take the risk of concluding us from a very broad adopted metric (what if the customer does not adopt the solution because they just wanted to reach a support agent?).
For the era of large language models (LLMS), fast forward-where we do not have a single output from only one mL model, we also have lessons as an output answer, pictures and music. The dimensions of the product that require metrics now increases rapidly – format, customer, type … list moves forward.
Over all my products, when I try to come up with the metrics, my first step is that I want to know about some important questions about its impact on customers. Identifying the correct set of questions makes it easy to identify the correct set of matrix. Here are a few examples:
- Did the customer get an output? → Metric for coverage
- How long did the product take to provide an output? → Metric for delay
- Did the user like the output? → Matrix for customer response, customer adoption and retention
Once you identify your major questions, the next step is to identify a set of sub-pranks for ‘input’ and ‘output’ signals. Output metrics indicators, where you can measure the event already occurring. Input matrix and major indicators can be used to identify trends or to predict results. See below for methods of adding leggings to the above questions and adding correct sub-tenders to leading indicators. There is no need to be a leading/leggings indicator in all questions.
- Did the customer get an output? → coverage
- How long did the product take to provide an output? → delay
- Did the user like the output? → Customer response, customer adoption and retention
- Did the user indicate that the output is correct/wrong? (Output)
- Was the output good/fair? (Input)
The third and final step is to identify the method to collect the metrics. Most metrics are collected by new instruments through data engineering. However, in some examples (eg question 3 above) especially for ML -based products, you have an option for manual or automatic evaluation that assesses models output. Although it is always best to develop automated evaluation, starting with manual evaluation for “” output was good/fair “and making a rubric for good, fair and good definitions will help you laying groundwork for a rigid and tested automated evaluation process.
Use of Examples: AI Search, List Description
The above outline can be applied to any ML-based product to identify the list of primary matrix for your product. Let’s search as an example.
Question | Metrics | Metric nature |
---|---|---|
Did the customer get an output? → coverage | % Search session with search results shown to customer | Production |
How long did the product take to provide an output? → delay | Time taken to display search results for the user | Production |
Did the user like the output? → Customer response, customer adoption and retention Did the user indicate that the output is correct/wrong? (Output) Was the output good/fair? (Input) | % Of the search sessions with ‘Thums Up’ response on the customer’s search results or the click from the customer For each search word, % of the search results marked as ‘good/fair’ per quality rubric | Production Input |
How about a product to generate details for a listing (whether it is a menu item in the doardash or a product listing on Amazon)?
Question | Metrics | Metric nature |
---|---|---|
Did the customer get an output? → coverage | % Listing with description generated | Production |
How long did the product take to provide an output? → delay | Time taken to generate details to the user | Production |
Did the user like the output? → Customer response, customer adoption and retention Did the user indicate that the output is correct/wrong? (Output) Was the output good/fair? (Input) | With the descriptions generated, the % of the listing that requires editing from the technical material team/seller/customer % Of the list details marked as ‘good/fair’, per quality rubric | Production Input |
The above mentioned approach is extending to many mL-based products. I hope this structure helps you define the correct set of matrix for your ML model.
Sharanya Rao is a group product manager Yours,