indices module
Statistical Parity - Disparate Impact - Demographic Parity¶
The rates of value-1 predictions from the different groups must be equal. Independence between the predictor and the protected variable.
-
S binary
\(P(f(X)=1|S=0) = P(f(X)=1|S=1)\) -
S continuous or discrete
\(P(f(X)=1|S) = P(f(X)=1)\)
Avoiding of Disparate Treatment¶
The probability that an input leads to prediction 1 should be equal regardless of the value of the sensitive variable.
- S binary
\(P(f(X)=1|X_S=x,S=0) = P(f(X)=1|X_S=x,S=1)\)
where \(X_S\) represents \(X\) without the sensitive variable.
Equality of Odds¶
The rates of true and false predictions from the different groups must be equal. Independence between the error of the model and the protected variable.
-
S binary
\(P(f(X)=1|Y=i,S=0) = P(f(X)=1|Y=i,S=1) ,i=0,1\) -
S general
\(P(f(X)=1|Y=i,S) = P(f(X)=1|Y=i) ,i=0,1\)
Avoiding of Disparate Mistreatment¶
The probability that a prediction is false should be equal regardless of the value of the sensitive variable.
- S binary
\(P(f(X)\ne Y|S=1) = P(f(X)\ne Y|S=0)\)
Global Sensitivity Analysis¶
GSA is used for quantifying the influence of a set of features on the outcome.
Sobol' indices are based on correlations and need access to the function while CVM' indices are based on rank and need only a sample of evaluations.
Sobol' indices
4 indices that quantify how much of the output variance can be explained by the variance of Xi.
Correlation Between Variables | Joined Contributions | |
---|---|---|
\(Sob_i\) | ✔️ | ❌ |
\(SobT_i\) | ✔️ | ✔️ |
\(Sob_i^{ind}\) | ❌ | ❌ |
\(SobT_i^{ind}\) | ❌ | ✔️ |
Cramer-Von Mises' indices
The 2 CVM' indices is an extension of the Sobol’ indices to quantify more than just the second-order influence of the inputs on the output.
For further details about GSA in Fairness
Case-of-use Recap¶
Disparate Impact | Avoiding Disparate Treatment | Equality Odds | Avoiding Disparate Mistreatment | Sobol' indices | Cramer-Von Mises' indices | |
---|---|---|---|---|---|---|
S binary | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
S discrete | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ✔️ |
S continuous | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ✔️ |
disparate_impact(index_input, group_reduction=np.mean)
¶
Compute the disparate impact.
Warning
disparate impact/equality of odds can only be computed on classification
problems, and on categorical variables. Continuous variables are dropped and
output replaced by np.nan
Note
When applied with target=classification_error
this function compute the
equality of odds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_input |
IndicesInput
|
The fairness problem to study. |
required |
group_reduction |
the method used to compute the indices for a group of variables. By default the average of the values of each groups is applied. |
np.mean
|
Returns:
Type | Description |
---|---|
IndicesOutput
|
IndicesOutput object, containing the CVM indices, one line per variable group |
IndicesOutput
|
and one column for each index. |
Source code in deel\fairsense\indices\standard_metrics.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
sobol_indices(inputs, n=1000, N=None)
¶
Compute all sobol indices for all variables
Warning
this indice may fail silently if all values of one variable are similar ( constant ) which may occurs when applying one hot encoding with a large number of splits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
IndicesInput
|
The fairness problem to study. |
required |
n |
number of sample used to compute the sobol indices |
1000
|
|
N |
number of sample used to compute marginals |
None
|
Returns:
Type | Description |
---|---|
IndicesOutput
|
IndicesOutput object, containing the CVM indices, one line per variable group |
IndicesOutput
|
and one column for each index. |
Source code in deel\fairsense\indices\sobol.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
cvm_indices(index_input)
¶
Compute the CVM indices of a fairness problem. Set FairnessProblem.result as a Dataframe containing the indices.
Warning
this indice may fail silently if all values of one variable are similar ( constant ) which may occurs when applying one hot encoding with a large number of splits. It may also yield erroneous results when used without enough data. Which might occur when used with confidence intervals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_input |
IndicesInput
|
The fairness problem to study. |
required |
Returns:
Type | Description |
---|---|
IndicesOutput
|
IndicesOutput object, containing the CVM indices, one line per variable group |
IndicesOutput
|
and one column for each index. |
Source code in deel\fairsense\indices\cvm.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
with_confidence_intervals(n_splits=31, shuffle=False, random_state=None)
¶
Function decorator that allows to compute confidence intervals using the naive method. The input data is split in n_splits and for each split indices are computed.
Warnings
No correction if applied on the output (small number of split will lead to overconfident intervals and a large number of split will lead to a large variance due to the lack of data).
This function must be applied on one of the indices computation function from the indices module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_splits |
positive integer : number of split. |
31
|
|
shuffle |
Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. |
False
|
|
random_state |
When |
None
|
Returns:
Type | Description |
---|---|
the original indice computation function enriched to compute confidence |
|
intervals. |
Source code in deel\fairsense\indices\confidence_intervals.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|