# The generation of ROC

ROC is very useful to estimate the quality of a binary classifier. But for long time, I do confuse how it can be generated. Although I know the concept of TPR(true positive ratio), FPR(false positive ratio), I still can’t know why it displays as a curve. Because in my mind, each TPR/FPR only shows one point in a model. How can a model give lots of TPR/FPR? Thanks to the book of *Introduction to Data Mining* by Pang-Ning Tan etc., I finally get the point from one example in the book.

As we’re dealing with the binary classifier, it’ll only generate the probability from to . For example, the Bayesian classifier can generate the posterior probability matches the above characteristics. Thus, we can assume our dataset as follows: there’re 5 positive and 5 negative data points in our testing dataset. And our binary classifier gives the dataset’s prediction probability with their class tag as follows: .

With my previous understanding, the whole work has finished. But in fact, it’s NOT! That’s why I got confused. The tricky part here is: although we have lots of prediction probability, what’s the predicted result? How should we give each data point a class label? Is data point 1 should be negative? Or the data point 3 should be positive? Given those results can’t show us the final class label!

So, what’s lack of here? The *cut-off* line! In fact, at this stage, we didn’t give the benchmark or baseline to decide how to determine the class label. And also, different *cut-off* line will give us different TPR/FPR pair. For example, if we treat the lowest probability above as the baseline, i.e. the as the positive *cut-off* line. Then, all the data points here will be treated as positive. Because the things that larger than and equal to should be treated as positive data, which is the definition of *cut-off* baseline.

Then, when we treated the lowest probability as baseline, it’ll only treat the data point as negative point. Because it’s the only data has the probability that lower than the baseline . But here, we just made a FN point. Because the actual class label of is , but we treated it as negative.

The process can continue until the last case, that we treat all the dataset as negative points.

As we’ve gotten those TPR/FPR pairs, we can draw the ROC lines corresponding. So in summary, the reason that confused me is I didn’t notice that a classifier only gives the prediction probability is not enough. We also have to give the corresponding *cut-off* line. And each *cut-off* line will generate one TPR/FPR pair. That’s how we get the final ROC curve.