Measuring feedback quality and relevance
1. Measuring feedback quality and relevance
Welcome! We'll now focus on how to assess the relevance of the feedback we've collected.2. Application of detecting anomalous feedback
Let's say we're training a model to improve product recommendations based on user reviews in a customer feedback system. Most feedback is helpful, but some reviews might be misleading or biased. Let's have a look at how this can be evaluated!3. Detecting anomalous feedback
One effective way to filter out irrelevant predictions is by using confidence scores. Here's a helpful function we can build, that we'll call the least_confidence function. This function takes a probability distribution array, calculates its maximum value ignoring any NaNs, and the number of labels, and calculates a confidence score as a function of the two. The probability distribution array comes from the output of the reward model. It represents the likelihood of different possible outcomes or labels based on the model's predictions for each feedback instance. This can be used, for instance, to filter out feedback for which the confidence is below a certain threshold, such as 0.5.4. Detecting anomalous feedback
Let's say, for example, that we are training a product recommendation system, and we have feedback from users on whether they found a recommendation useful or not. The feedback is processed by a reward model that outputs confidence scores, in the form of a probability distribution, for each piece of feedback. Now, we filter out feedback with low confidence, below 0.5, to ensure we only keep feedback that the model is more confident in. As expected, only one score is selected.5. K-means
Another way we can categorize feedback is k-means clustering. K-means is a popular algorithm for partitioning data into clusters. It aims to group data points so that points in the same cluster are more similar to each other than to those in other clusters. K-means is great for detecting anomalies because it groups similar feedback, making outliers easy to spot. It is also easy to understand and can be quickly implemented. To decide how many clusters we want, we typically use domain knowledge to determine the expected number of distinct types of feedback. Alternatively, analytical methods allow to evaluate how well the data points fit within the clusters, but these won't be covered as part of this course.6. Anomaly detection with k-means
To implement this as code, we first import pandas and numpy along with the k-means algorithms from sklearn. We then define the 'detect_anomalies' function. We initialize k-means with the number of clusters we want, setting the random state to 42 for reproducibility. Then, we fit k-means to our data. We then calculate the distance of each data point from its assigned cluster center, using np.linalg.norm. Points with large distances from their cluster center are likely anomalies. Finally, we return these distances.7. Anomaly detection with k-means
Let's say we have collected feedback scores on a product from 5 different users. We can use the function we just defined to find if there's any feedback that can be considered an anomaly. After running the function, we see from the output measurements that the third score is the furthest away from the center of the cluster. This is the anomaly, and we can inspect the data further and consider whether to remove it or not.8. Let's practice!
Let's practice detecting anomalies!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.