Second Order Influence Calculator¶
View source | 📰 Original Paper
When working with groups of data, it can prove useful to take into account the pairwise interactions in terms of influence when leaving out large groups of data-points. Basu et al. have thus introduced a second-order formulation that takes these interactions into account:
As with the rest of the methods based on calculating inverse-hessian-vector products, an important part of the
computations are carried out by objects from the class InverseHessianVectorProduct
.
Notebooks¶
SecondOrderInfluenceCalculator
¶
A class implementing the necessary methods to compute the different influence quantities
(only for groups) using a second-order approximation, thus allowing us to take into
account the pairwise interactions between points inside the group. For small groups of
points, consider using the first order alternative if the computational cost is
too high.
__init__(self,
model: deel.influenciae.common.model_wrappers.InfluenceModel,
dataset: tf.Dataset,
ihvp_calculator: Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact',
n_samples_for_hessian: Optional[int] = None,
shuffle_buffer_size: Optional[int] = 10000)
¶
model: deel.influenciae.common.model_wrappers.InfluenceModel,
dataset: tf.Dataset,
ihvp_calculator: Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact',
n_samples_for_hessian: Optional[int] = None,
shuffle_buffer_size: Optional[int] = 10000)
Parameters
-
model : deel.influenciae.common.model_wrappers.InfluenceModel
The TF2.X model implementing the InfluenceModel interface.
-
dataset : tf.Dataset
A batched TF dataset containing the training dataset over which we will estimate the inverse-hessian-vector product.
-
ihvp_calculator : Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact'
Either a string containing the IHVP method ('exact' or 'cgd'), an IHVPCalculator object or an InverseHessianVectorProduct object.
-
n_samples_for_hessian : Optional[int] = None
An integer indicating the amount of samples to take from the provided train dataset.
-
shuffle_buffer_size : Optional[int] = 10000
An integer indicating the buffer size of the train dataset's shuffle operation -- when choosing the amount of samples for the hessian.
assert_compatible_datasets(dataset_a: tf.Dataset,
dataset_b: tf.Dataset) -> int
¶
dataset_b: tf.Dataset) -> int
Assert that the datasets are compatible: that they contain the same number of points. Else,
throw an error.
Parameters
-
dataset_a : tf.Dataset
First batched tensorflow dataset to check.
-
dataset_b : tf.Dataset
Second batched tensorflow dataset to check.
Return
-
size : int
The size of the dataset.
compute_influence_vector_group(self,
group: tf.Dataset) -> tf.Tensor
¶
group: tf.Dataset) -> tf.Tensor
Computes the influence function vector -- an estimation of the weights difference when
removing the points -- of the whole group of points.
Parameters
-
group : tf.Dataset
A batched TF dataset containing the group of points of which we wish to compute the influence of removal.
Return
-
influence_group : tf.Tensor
A tensor containing one vector for the whole group.
estimate_influence_values_group(self,
group_train: tf.Dataset,
group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor
¶
group_train: tf.Dataset,
group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor
Computes Cook's distance of the whole group of points provided, giving measure of the
influence that the group carries on the model's weights.
Parameters
-
group_train : tf.Dataset
A batched TF dataset containing the group of points we wish to remove.
-
group_to_evaluate : Optional[tf.Dataset] = None
A batched TF dataset containing the group of points with respect to whom we wish to measure the influence of removing the training points.
Return
-
influence_values_group : tf.Tensor
A tensor containing one influence value for the whole group.