Utilities

Utilities for neural network components.

class TransformerEncoder(pretrained_model_name_or_path, max_length=None)[source]

A combination of a tokenizer and a model.

Initialize the encoder.

Parameters
  • pretrained_model_name_or_path (str) – the name of the pretrained model, or a path, cf. transformers.AutoModel.from_pretrained()

  • max_length (Optional[int]) – >0, default: 512 the maximum number of tokens to pad/trim the labels to

Raises

ImportError – if the transformers library could not be imported

encode_all(labels, batch_size=1)[source]

Encode all labels (inference mode & batched).

Parameters
  • labels (Sequence[str]) – a sequence of strings to encode

  • batch_size (int) – the batch size to use for encoding the labels. batch_size=1 means that the labels are encoded one-by-one, while batch_size=len(labels) would correspond to encoding all at once. Larger batch sizes increase memory requirements, but may be computationally more efficient.

Return type

FloatTensor

Returns

shape: (len(labels), dim) a tensor representing the encodings for all labels

forward(labels)[source]

Encode labels via the provided model and tokenizer.

Return type

FloatTensor