Metrics
bench_metrics(scores, labels=None, in_value=0, out_value=1, metrics=['auroc', 'fpr95tpr'], threshold=None, step=4)
¶
Compute various common metrics from the OOD detector scores: AUROC, FPR95TPR (or any other similar metric relative to confusion matrix), Detection accuracy and sklearn.metric metrics
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scores |
Union[ndarray, Tuple[ndarray, ndarray]]
|
scores output of the OOD detector to evaluate. If a tuple is provided, the first array is considered in-distribution scores, and the second is considered out-of-distribution scores. |
required |
labels |
Optional[ndarray]
|
labels denoting oodness. When scores is a tuple, this argument and the following in_value and out_value are not used. If scores is a np.ndarray, labels are required with in_value and out_value if different from their default values. Defaults to None. |
None
|
in_value |
Optional[int]
|
ood label value for in-distribution data. Defaults to 0. |
0
|
out_value |
Optional[int]
|
ood label value for out-of-distribution data. Defaults to 1. |
1
|
metrics |
Optional[List[str]]
|
list of metrics to compute. Can pass
any metric name from sklearn.metric or among "detect_acc" and
" |
['auroc', 'fpr95tpr']
|
threshold |
Optional[float]
|
Threshold to use when using threshold-dependent metrics. Defaults to None. |
None
|
step |
Optional[int]
|
integration step (wrt percentile). Only used for auroc and fpr95tpr. Defaults to 4. |
4
|
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
Dictionnary of metrics |
Source code in oodeel/eval/metrics.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
ftpn(scores, labels, threshold)
¶
Computes the number of * true positives, * false positives, * true negatives, * false negatives, for a given threshold
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scores |
ndarray
|
scores output of the OOD detector to evaluate |
required |
labels |
ndarray
|
1 if ood else 0 |
required |
threshold |
float
|
threshold to use to consider scores as in-distribution or out-of-distribution |
required |
Returns:
Type | Description |
---|---|
tuple
|
Tuple[float]: The four metrics |
Source code in oodeel/eval/metrics.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
|
get_curve(scores, labels, step=4, return_raw=False)
¶
Computes the * true positive rate: TP / (TP + FN), * false positive rate: FP / (FP + TN), * true negative rate: TN / (FP + TN), * false negative rate: FN / (TP + FN), * accuracy: (TN + TP) / (TP + FP + TN + FN), for different threshold values. The values are uniformly distributed among the percentiles, with a step = 4 / scores.shape[0]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scores |
ndarray
|
scores output of the OOD detector to evaluate |
required |
labels |
ndarray
|
1 if ood else 0 |
required |
step |
Optional[int]
|
integration step (wrt percentile). Defaults to 4. |
4
|
return_raw |
Optional[bool]
|
To return all the curves or only the rate curves. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
Union[Tuple[tuple, tuple], tuple]
|
Union[Tuple[Tuple[np.ndarray], Tuple[np.ndarray]], Tuple[np.ndarray]]: curves |
Source code in oodeel/eval/metrics.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|