Second Order Influence Calculator¶

View source | 📰 Original Paper

When working with groups of data, it can prove useful to take into account the pairwise interactions in terms of influence when leaving out large groups of data-points. Basu et al. have thus introduced a second-order formulation that takes these interactions into account:

\[ \mathcal{I}^{(2)} (\mathcal{U}, z_t) = \nabla_\theta \ell (\hat{\theta}, z_t) \left(\mathcal{I}^{(1)} (\mathcal{U}) + \mathcal{I}' (\mathcal{U})\right) \]

\[ \mathcal{I}^{(1)} (\mathcal{U}) = \frac{1 - 2 p}{(1 - p)^2} \frac{1}{|\mathcal{S}|} H_{\hat{\theta}}^{-1} \sum_{z \in \mathcal{U}} \nabla_\theta \ell (\hat{\theta}, z) \]

\[ \mathcal{I}' (\mathcal{U}) = \frac{1}{(1 - p)^2} \frac{1}{|S|^2} \sum_{z \in \mathcal{U}} H_{\hat{\theta}}^{-1} \nabla_\theta^2 (\hat{\theta}, z) H_{\hat{\theta}}^{-1} \sum_{z' \in \mathcal{U}} \nabla_\theta \ell (\hat{\theta}, z') \]

As with the rest of the methods based on calculating inverse-hessian-vector products, an important part of the computations are carried out by objects from the class InverseHessianVectorProduct.

Notebooks¶

Using the second order influence calculator

`SecondOrderInfluenceCalculator`¶

A class implementing the necessary methods to compute the different influence quantities (only for groups) using a second-order approximation, thus allowing us to take into account the pairwise interactions between points inside the group. For small groups of points, consider using the first order alternative if the computational cost is too high.

`init(self, model: deel.influenciae.common.model_wrappers.InfluenceModel, dataset: tf.Dataset, ihvp_calculator: Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact', n_samples_for_hessian: Optional[int] = None, shuffle_buffer_size: Optional[int] = 10000)`¶

Parameters

model : deel.influenciae.common.model_wrappers.InfluenceModel
- The TF2.X model implementing the InfluenceModel interface.
dataset : tf.Dataset
- A batched TF dataset containing the training dataset over which we will estimate the inverse-hessian-vector product.
ihvp_calculator : Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact'
- Either a string containing the IHVP method ('exact' or 'cgd'), an IHVPCalculator object or an InverseHessianVectorProduct object.
n_samples_for_hessian : Optional[int] = None
- An integer indicating the amount of samples to take from the provided train dataset.
shuffle_buffer_size : Optional[int] = 10000
- An integer indicating the buffer size of the train dataset's shuffle operation -- when choosing the amount of samples for the hessian.

`assert_compatible_datasets(dataset_a: tf.Dataset, dataset_b: tf.Dataset) -> int`¶

Assert that the datasets are compatible: that they contain the same number of points. Else, throw an error.

Parameters

dataset_a : tf.Dataset
- First batched tensorflow dataset to check.
dataset_b : tf.Dataset
- Second batched tensorflow dataset to check.

Return

size : int
- The size of the dataset.

`compute_influence_vector_group(self, group: tf.Dataset) -> tf.Tensor`¶

Computes the influence function vector -- an estimation of the weights difference when removing the points -- of the whole group of points.

Parameters

group : tf.Dataset
- A batched TF dataset containing the group of points of which we wish to compute the influence of removal.

Return

influence_group : tf.Tensor
- A tensor containing one vector for the whole group.

`estimate_influence_values_group(self, group_train: tf.Dataset, group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor`¶

Computes Cook's distance of the whole group of points provided, giving measure of the influence that the group carries on the model's weights.

Parameters

group_train : tf.Dataset
- A batched TF dataset containing the group of points we wish to remove.
group_to_evaluate : Optional[tf.Dataset] = None
- A batched TF dataset containing the group of points with respect to whom we wish to measure the influence of removing the training points.

Return

influence_values_group : tf.Tensor
- A tensor containing one influence value for the whole group.

Second Order Influence Calculator¶

Notebooks¶

SecondOrderInfluenceCalculator¶

assert_compatible_datasets(dataset_a: tf.Dataset, dataset_b: tf.Dataset) -> int¶

compute_influence_vector_group(self, group: tf.Dataset) -> tf.Tensor¶

estimate_influence_values_group(self, group_train: tf.Dataset, group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor¶

`SecondOrderInfluenceCalculator`¶

`assert_compatible_datasets(dataset_a: tf.Dataset, dataset_b: tf.Dataset) -> int`¶

`compute_influence_vector_group(self, group: tf.Dataset) -> tf.Tensor`¶

`estimate_influence_values_group(self, group_train: tf.Dataset, group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor`¶