Some Definitions:
TP = true positive FP = false positive FN = false negative
TN = true negative
% of Correct
Predictions: For Y ~(0,1) the percentage of total correct predictions.
(again see here
for more details) or (TP + TN) / (TP +
FP + TN + FN)
Precision:
Percentage of correctly predicted 1’s or
TP/(TP + FP))
Recall:
Percentage of total observed or true 1’s correctly classified or TP/(TP+FN)
also true positive rate
False Positive Rate:
FP /(FP + TN)
F1-Score: The
harmonic mean of precision and recall or (2*Precision*Recall)/(Precision +
Recall)
True positive rate:
TP/(TP +FN) = Recall
Sensitivity: =
recall
Specificity: = Percentage
of total observed or true 0’s correctly classified or TN/(FP + TN) or 1- false positive rate
All of the metrics mentioned above are based on classifying
predictions based on a cutoff. If the predicted probability exceeds some threshold
‘c’ then we assign that observation a class value = 1. Otherwise the
observation gets assigned a value = 0. These metrics are based on a single
chosen cutoff. (one could examine multiple cutoffs and find the optimal value
for c).
As explained in a previous post, the ROC curve is constructed
by examining all possible cutoffs. The ROC
curve visualizes the tradeoffs between the true positive rate and false
positive rate or sensitivity vs. 1-specificity. Particularly we are usually
interested in the area under the ROC curve (AROC or c-statistic). The ROC curve is a measure of a model’s discriminatory
power. The area under the ROC curve can
be interpreted as the probability that a classifier will correctly rank a
randomly chosen training example with a
positive outcome higher than a randomly chosen example with a negative
outcome(Cook,2007).
This method is used increasingly in the machine
learning community and is preferred over other measures of fit like precision
or the F1-Score because it evaluates model performance across all considered
cutoff values vs. an arbitrarily chosen cutoff (Bradley, 1997).It also gives a measure of classifier
performance that gives low scores to random or one class only classifiers
(Bradley,1997).
References:
Bradley, Andrew P. Pattern Recognition, Volume 30, issue 7 (July, 1997), p. 1145-1159. Elsevier Science
Bradley, Andrew P. Pattern Recognition, Volume 30, issue 7 (July, 1997), p. 1145-1159. Elsevier Science
Provost, F. J.,
Fawcett, T.,& Kohavi, R. (1998). The
Case against Accuracy Estimation for Comparing
Induction Algorithms. Proceedings of the
Fifteenth International Conference on
Machine Learning (pp.445-453)(ICML '98), Jude W. Shavlik (Ed.). Morgan
Kaufmann Publishers Inc.,San Francisco, CA, USA.
Nancy R. Cook, Use
and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation. 2007; 115: 928-935
Pattern Recognition Letters 27 (2006) 861–874
No comments:
Post a Comment