VisualRepresentation

class VisualRepresentation(images: Sequence[str | Path | Tensor], encoder: str | Module, layer_name: str, max_id: int | None = None, shape: int | Sequence[int] | None = None, transforms: Sequence | None = None, encoder_kwargs: Mapping[str, Any] | None = None, batch_size: int = 32, trainable: bool = True, **kwargs)[source]

Bases: Representation

Visual representations using a torchvision model.

Initialize the representations.

Parameters:
  • images (Sequence[str | Path | Tensor]) – the images, either as tensors, or paths to image files.

  • encoder (str | Module) – the encoder to use. If given as a string, lookup in torchvision.models

  • layer_name (str) – the model’s layer name to use for extracting the features, cf. torchvision.models.feature_extraction.create_feature_extractor()

  • max_id (int | None) – the number of representations. If given, it must match the number of images.

  • shape (int | Sequence[int] | None) – the shape of an individual representation. If provided, it must match the encoder output dimension

  • transforms (Sequence | None) – transformations to apply to the images. Notice that stochastic transformations will result in stochastic representations, too.

  • encoder_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters passed to encoder upon instantiation.

  • batch_size (int) – the batch size to use during encoding

  • trainable (bool) – whether the encoder should be trainable

  • kwargs – additional keyword-based parameters passed to Representation.__init__().

Raises:

ValueError – if max_id is provided and does not match the number of images