Skip to content

Second Order Influence Calculator

View source | 📰 Original Paper

When working with groups of data, it can prove useful to take into account the pairwise interactions in terms of influence when leaving out large groups of data-points. Basu et al. have thus introduced a second-order formulation that takes these interactions into account:

\[ \mathcal{I}^{(2)} (\mathcal{U}, z_t) = \nabla_\theta \ell (\hat{\theta}, z_t) \left(\mathcal{I}^{(1)} (\mathcal{U}) + \mathcal{I}' (\mathcal{U})\right) \]
\[ \mathcal{I}^{(1)} (\mathcal{U}) = \frac{1 - 2 p}{(1 - p)^2} \frac{1}{|\mathcal{S}|} H_{\hat{\theta}}^{-1} \sum_{z \in \mathcal{U}} \nabla_\theta \ell (\hat{\theta}, z) \]
\[ \mathcal{I}' (\mathcal{U}) = \frac{1}{(1 - p)^2} \frac{1}{|S|^2} \sum_{z \in \mathcal{U}} H_{\hat{\theta}}^{-1} \nabla_\theta^2 (\hat{\theta}, z) H_{\hat{\theta}}^{-1} \sum_{z' \in \mathcal{U}} \nabla_\theta \ell (\hat{\theta}, z') \]

As with the rest of the methods based on calculating inverse-hessian-vector products, an important part of the computations are carried out by objects from the class InverseHessianVectorProduct.

Notebooks

SecondOrderInfluenceCalculator

A class implementing the necessary methods to compute the different influence quantities (only for groups) using a second-order approximation, thus allowing us to take into account the pairwise interactions between points inside the group. For small groups of points, consider using the first order alternative if the computational cost is too high.

__init__(self,
         model: deel.influenciae.common.model_wrappers.InfluenceModel,
         dataset: tf.Dataset,
         ihvp_calculator: Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact',
         n_samples_for_hessian: Optional[int] = None,
         shuffle_buffer_size: Optional[int] = 10000)

Parameters

  • model : deel.influenciae.common.model_wrappers.InfluenceModel

    • The TF2.X model implementing the InfluenceModel interface.

  • dataset : tf.Dataset

    • A batched TF dataset containing the training dataset over which we will estimate the inverse-hessian-vector product.

  • ihvp_calculator : Union[str, deel.influenciae.common.inverse_hessian_vector_product.InverseHessianVectorProduct, deel.influenciae.common.inverse_hessian_vector_product.IHVPCalculator] = 'exact'

    • Either a string containing the IHVP method ('exact' or 'cgd'), an IHVPCalculator object or an InverseHessianVectorProduct object.

  • n_samples_for_hessian : Optional[int] = None

    • An integer indicating the amount of samples to take from the provided train dataset.

  • shuffle_buffer_size : Optional[int] = 10000

    • An integer indicating the buffer size of the train dataset's shuffle operation -- when choosing the amount of samples for the hessian.

assert_compatible_datasets(dataset_a: tf.Dataset,
                           dataset_b: tf.Dataset) -> int

Assert that the datasets are compatible: that they contain the same number of points. Else, throw an error.

Parameters

  • dataset_a : tf.Dataset

    • First batched tensorflow dataset to check.

  • dataset_b : tf.Dataset

    • Second batched tensorflow dataset to check.

Return

  • size : int

    • The size of the dataset.


compute_influence_vector_group(self,
                               group: tf.Dataset) -> tf.Tensor

Computes the influence function vector -- an estimation of the weights difference when removing the points -- of the whole group of points.

Parameters

  • group : tf.Dataset

    • A batched TF dataset containing the group of points of which we wish to compute the influence of removal.

Return

  • influence_group : tf.Tensor

    • A tensor containing one vector for the whole group.


estimate_influence_values_group(self,
                                group_train: tf.Dataset,
                                group_to_evaluate: Optional[tf.Dataset] = None) -> tf.Tensor

Computes Cook's distance of the whole group of points provided, giving measure of the influence that the group carries on the model's weights.

Parameters

  • group_train : tf.Dataset

    • A batched TF dataset containing the group of points we wish to remove.

  • group_to_evaluate : Optional[tf.Dataset] = None

    • A batched TF dataset containing the group of points with respect to whom we wish to measure the influence of removing the training points.

Return

  • influence_values_group : tf.Tensor

    • A tensor containing one influence value for the whole group.