VisualRepresentation

class VisualRepresentation(images: Sequence[str | Path | Tensor], encoder: str | Module, layer_name: str, max_id: int | None = None, shape: int | Sequence[int] | None = None, transforms: Sequence | None = None, encoder_kwargs: Mapping[str, Any] | None = None, batch_size: int = 32, trainable: bool = True, **kwargs)[source]

Bases: Representation

Visual representations using a torchvision model.

Initialize the representations.

Parameters:
  • images (Sequence[str | Path | Tensor]) – The images, either as tensors, or paths to image files.

  • encoder (str | Module) – The encoder to use. If given as a string, lookup in torchvision.models.

  • layer_name (str) – The model’s layer name to use for extracting the features, cf. torchvision.models.feature_extraction.create_feature_extractor()

  • max_id (int | None) – The number of representations. If given, it must match the number of images.

  • shape (int | Sequence[int] | None) – The shape of an individual representation. If provided, it must match the encoder output dimension

  • transforms (Sequence | None) – Transformations to apply to the images. Notice that stochastic transformations will result in stochastic representations, too.

  • encoder_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters passed to encoder upon instantiation.

  • batch_size (int) – The batch size to use during encoding.

  • trainable (bool) – Whether the encoder should be trainable.

  • kwargs – Additional keyword-based parameters passed to Representation.

Raises:

ValueError – If max_id is provided and does not match the number of images.