Similar-Examples¶
View colab tutorial |
View source
We designate here as Similar Examples all methods that given an input sample, search for the most similar training samples given a distance function distance
. Furthermore, one can define the search space using a projection
function (see Projections). This function should map an input sample to the search space where the distance function is defined and meaningful (e.g. the latent space of a Convolutional Neural Network).
Then, a K-Nearest Neighbors (KNN) search is performed to find the most similar samples in the search space.
Example¶
from xplique.example_based import SimilarExamples
cases_dataset = ... # load the training dataset
targets = ... # load the one-hot encoding of predicted labels of the training dataset
# parameters
k = 5
distance = "euclidean"
case_returns = ["examples", "nuns"]
# define the projection function
def custom_projection(inputs: tf.Tensor, np.ndarray, targets: tf.Tensor, np.ndarray = None):
'''
Example of projection,
inputs are the elements to project.
targets are optional parameters to orientate the projection.
'''
projected_inputs = # do some magic on inputs, it should use the model.
return projected_inputs
# instantiate the SimilarExamples object
sim_ex = SimilarExamples(
cases_dataset=cases_dataset,
targets_dataset=targets,
k=k,
projection=custom_projection,
distance=distance,
)
# load the test samples and targets
test_samples = ... # load the test samples to search for
test_targets = ... # load the one-hot encoding of the test samples' predictions
# search the most similar samples with the SimilarExamples method
similar_samples = sim_ex.explain(test_samples, test_targets)
Notebooks¶
SimilarExamples
¶
Class for similar example-based method. This class allows to search the k Nearest Neighbor
of an input in the projected space (defined by the projection method)
using the distance defined by the distance method provided.
__init__(self,
cases_dataset: ~DatasetOrTensor,
labels_dataset: Optional[~DatasetOrTensor] = None,
targets_dataset: Optional[~DatasetOrTensor] = None,
k: int = 1,
projection: Union[xplique.example_based.projections.base.Projection, Callable] = None,
case_returns: Union[List[str], str] = 'examples',
batch_size: Optional[int] = None,
distance: Union[int, str, Callable] = 'euclidean')
¶
cases_dataset: ~DatasetOrTensor,
labels_dataset: Optional[~DatasetOrTensor] = None,
targets_dataset: Optional[~DatasetOrTensor] = None,
k: int = 1,
projection: Union[xplique.example_based.projections.base.Projection, Callable] = None,
case_returns: Union[List[str], str] = 'examples',
batch_size: Optional[int] = None,
distance: Union[int, str, Callable] = 'euclidean')
Parameters
-
cases_dataset : ~DatasetOrTensor
The dataset used to train the model, examples are extracted from this dataset.
All datasets (cases, labels, and targets) should be of the same type.
Supported types are:
tf.data.Dataset
,torch.utils.data.DataLoader
,tf.Tensor
,np.ndarray
,torch.Tensor
.For datasets with multiple columns, the first column is assumed to be the cases.
While the second column is assumed to be the labels, and the third the targets.
Warning: datasets tend to reshuffle at each iteration, ensure the datasets are not reshuffle as we use index in the dataset.
-
labels_dataset : Optional[~DatasetOrTensor] = None
Labels associated with the examples in the
cases_dataset
.It should have the same type as
cases_dataset
.
-
targets_dataset : Optional[~DatasetOrTensor] = None
Targets associated with the
cases_dataset
for dataset projection, oftentimes the one-hot encoding of a model's predictions. Seeprojection
for detail.It should have the same type as
cases_dataset
.It is not be necessary for all projections.
Furthermore, projections which requires it compute it internally by default.
-
k : int = 1
The number of examples to retrieve per input.
-
projection : Union[xplique.example_based.projections.base.Projection, Callable] = None
Projection or Callable that project samples from the input space to the search space.
The search space should be a space where distances are relevant for the model.
It should not be
None
, otherwise, the model is not involved thus not explained.Example of Callable:
def custom_projection(inputs: tf.Tensor, np.ndarray, targets: tf.Tensor, np.ndarray = None): ''' Example of projection, inputs are the elements to project.</p><p> targets are optional parameters to orientated the projection.</p><p> ''' projected_inputs = # do some magic on inputs, it should use the model.</p><p> return projected_inputs
-
case_returns : Union[List[str], str] = 'examples'
String or list of string with the elements to return in
self.explain()
.See the base class returns property for more details.
-
batch_size : Optional[int] = None
Number of samples treated simultaneously for projection and search.
Ignored if
cases_dataset
is a batchedtf.data.Dataset
or a batchedtorch.utils.data.DataLoader
is provided.
-
distance : Union[int, str, Callable] = 'euclidean'
Distance for the knn search method. It can be an integer, a string in {"manhattan", "euclidean", "cosine", "chebyshev", "inf"}, or a Callable, by default "euclidean".
explain(self,
inputs: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None)
¶
inputs: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None)
Return the relevant examples to explain the (inputs, targets).
It projects inputs with self.projection
in the search space
and find examples with the self.search_method
.
Parameters
-
inputs : Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]
Tensor or Array. Input samples to be explained.
Expected shape among (N, W), (N, T, W), (N, W, H, C).
More information in the documentation.
-
targets : Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None
Targets associated to the
inputs
for projection.Shape: (n, nb_classes) where n is the number of samples and nb_classes is the number of classes.
It is used in the
projection
. Butprojection
can compute it internally.
Return
-
return_dict
Dictionary with listed elements in
self.returns
.The elements that can be returned are defined with the
_returns_possibilities
static attribute of the class.