ROC is very useful to estimate the quality of a binary classifier. But for long time, I do confuse how it can be generated. Although I know the concept of TPR(true positive ratio), FPR(false positive ratio), I still can’t know why it displays as a curve. Because in my mind, each TPR/FPR only shows one point in a model. How can a model give lots of TPR/FPR? Thanks to the book of Introduction to Data Mining by Pang-Ning Tan etc., I finally get the point from one example in the book.
As we’re dealing with the binary classifier, it’ll only generate the probability from to . For example, the Bayesian classifier can generate the posterior probability matches the above characteristics. Thus, we can assume our dataset as follows: there’re 5 positive and 5 negative data points in our testing dataset. And our binary classifier gives the dataset’s prediction probability with their class tag as follows: .
With my previous understanding, the whole work has finished. But in fact, it’s NOT! That’s why I got confused. The tricky part here is: although we have lots of prediction probability, what’s the predicted result? How should we give each data point a class label? Is data point 1 should be negative? Or the data point 3 should be positive? Given those results can’t show us the final class label!
So, what’s lack of here? The cut-off line! In fact, at this stage, we didn’t give the benchmark or baseline to decide how to determine the class label. And also, different cut-off line will give us different TPR/FPR pair. For example, if we treat the lowest probability above as the baseline, i.e. the as the positive cut-off line. Then, all the data points here will be treated as positive. Because the things that larger than and equal to should be treated as positive data, which is the definition of cut-off baseline.
Then, when we treated the lowest probability as baseline, it’ll only treat the data point as negative point. Because it’s the only data has the probability that lower than the baseline . But here, we just made a FN point. Because the actual class label of is , but we treated it as negative.
The process can continue until the last case, that we treat all the dataset as negative points.
As we’ve gotten those TPR/FPR pairs, we can draw the ROC lines corresponding. So in summary, the reason that confused me is I didn’t notice that a classifier only gives the prediction probability is not enough. We also have to give the corresponding cut-off line. And each cut-off line will generate one TPR/FPR pair. That’s how we get the final ROC curve.