VisualRepresentation

Visual representations using a torchvision model.

Initialize the representations.

Parameters:

images (Sequence[str | Path | Tensor]) – The images, either as tensors, or paths to image files.
encoder (str | Module) – The encoder to use. If given as a string, lookup in torchvision.models.
layer_name (str) – The model’s layer name to use for extracting the features, cf. torchvision.models.feature_extraction.create_feature_extractor()
max_id (int) – The number of representations. If given, it must match the number of images.
shape (tuple[int, ...]) – The shape of an individual representation. If provided, it must match the encoder output dimension
transforms (Sequence | None) – Transformations to apply to the images. Notice that stochastic transformations will result in stochastic representations, too.
encoder_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters passed to encoder upon instantiation.
batch_size (int) – The batch size to use during encoding.
trainable (bool) – Whether the encoder should be trainable.
kwargs – Additional keyword-based parameters passed to Representation.

Raises:

ValueError – If max_id is provided and does not match the number of images.