Exploring an alternative classification metric
In the world of data science, metrics are the compass that guide our models to success. While many are familiar with the classic measures of precision and recall, there are actually a wide range of other options that are worth exploring.
In this article, we’ll dive into the Tversky index. This metric, a generalization of the Dice and Jaccard coefficients, can be extremely useful when trying to balance precision and recall against each other. When implemented as a loss function for neural networks, it can be a powerful way to deal with class imbalances.
A quick refresher on precision and recall
Imagine you are a detective tasked with capturing criminals in your town. In truth, there are 10 criminals roaming the streets.
In your first month, you bring in 8 suspects you assume to be criminals. Only 4 of them end up being guilty, while the other 4 are innocent.
If you were a machine learning model, you’d be evaluated against your precision and recall.
Precision asks: “of all those you caught, how many were criminals?”
Recall asks: “of all the criminals in the town, how many did you catch?”
Precision is a metric that captures how accurate your predictions are, not counting how many true positives you miss (false negatives). Recall measures how many of the true positives you capture, irrespective of how many false positives you get.
How do your detective skills rate against these metrics?
- precision = 4 / (4 + 4) = 0.5
- recall = 4 / (4 + 6) = 0.4
Balancing precision and recall: the F1 metric
In an ideal world, your classifier has both high precision and high recall. As a measure of how well your classifier is doing against both, the F1 statistic measures the harmonic mean between the two:
This metric is also sometimes called the Dice similarity coefficient (DSC).