**Some Definitions:**

TP = true positive FP = false positive FN = false negative
TN = true negative

**% of Correct Predictions:**For Y ~(0,1) the percentage of total correct predictions. (again see here for more details) or (TP + TN) / (TP + FP + TN + FN)

**Precision**: Percentage of correctly predicted 1’s or TP/(TP + FP))

**Recall:**Percentage of total observed or true 1’s correctly classified or TP/(TP+FN) also

*true positive rate*

**False Positive Rate:**FP /(FP + TN)

**F1-Score:**The harmonic mean of precision and recall or (2*Precision*Recall)/(Precision + Recall)

**True positive rate:**TP/(TP +FN) = Recall

**Sensitivity:**= recall

**Specificity:**= Percentage of total observed or true 0’s correctly classified or TN/(FP + TN) or 1- false positive rate

All of the metrics mentioned above are based on classifying
predictions based on a cutoff. If the predicted probability exceeds some threshold
‘c’ then we assign that observation a class value = 1. Otherwise the
observation gets assigned a value = 0. These metrics are based on a single
chosen cutoff. (one could examine multiple cutoffs and find the optimal value
for c).

As explained in a previous post, the ROC curve is constructed
by examining all possible cutoffs. The ROC
curve visualizes the tradeoffs between the true positive rate and false
positive rate or sensitivity vs. 1-specificity. Particularly we are usually
interested in the area under the ROC curve (AROC or c-statistic). The ROC curve is a measure of a model’s discriminatory
power. The area under the ROC curve can
be interpreted as the probability that a classifier will correctly rank a
randomly chosen training example with a
positive outcome higher than a randomly chosen example with a negative
outcome(Cook,2007).

This method is used increasingly in the machine
learning community and is preferred over other measures of fit like precision
or the F1-Score because it evaluates model performance across all considered
cutoff values vs. an arbitrarily chosen cutoff (Bradley, 1997).It also gives a measure of classifier
performance that gives low scores to random or one class only classifiers
(Bradley,1997).

**References:**

Bradley, Andrew P. Pattern Recognition, Volume 30, issue 7 (July, 1997), p. 1145-1159. Elsevier Science

Provost, F. J.,
Fawcett, T.,& Kohavi, R. (1998). The
Case against Accuracy Estimation for Comparing
Induction Algorithms.

*Proceedings of the Fifteenth International Conference on Machine Learning*(pp.445-453)(ICML '98), Jude W. Shavlik (Ed.). Morgan Kaufmann Publishers Inc.,San Francisco, CA, USA.
Nancy R. Cook, Use
and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation.

*2007;**115:**928-935*Pattern Recognition Letters 27 (2006) 861–874

## No comments:

## Post a Comment