TokenizationRepresentation

class TokenizationRepresentation(assignment: Tensor, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, shape: int | Sequence[int] | None = None, **kwargs)[source]

Bases: Representation

A module holding the result of tokenization.

Initialize the tokenization.

Parameters:
  • assignment (Tensor) – shape: (n, num_chosen_tokens) the token assignment.

  • token_representation (str | Representation | type[Representation] | None) – shape: (num_total_tokens, *shape) the token representations

  • token_representation_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters

  • shape (tuple[int, ...]) – The shape of an individual representation. If provided, has to match.

  • kwargs – additional keyword-based parameters passed to Representation.__init__()

Raises:

ValueError – if there’s a mismatch between the representation size and the vocabulary size

Attributes Summary

num_tokens

Return the number of selected tokens for ID.

Methods Summary

from_tokenizer(tokenizer, num_tokens, ...[, ...])

Create a tokenization from applying a tokenizer.

iter_extra_repr()

Iterate over components for extra_repr().

save_assignment(output_path)

Save the assignment to a file.

Attributes Documentation

num_tokens

Return the number of selected tokens for ID.

Methods Documentation

classmethod from_tokenizer(tokenizer: Tokenizer, num_tokens: int, mapped_triples: Tensor, num_entities: int, num_relations: int, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, **kwargs) TokenizationRepresentation[source]

Create a tokenization from applying a tokenizer.

Parameters:
  • tokenizer (Tokenizer) – the tokenizer instance.

  • num_tokens (int) – the number of tokens to select for each entity.

  • token_representation (str | Representation | type[Representation] | None) – the pre-instantiated token representations, class, or name of a class

  • token_representation_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters

  • mapped_triples (Tensor) – the ID-based triples

  • num_entities (int) – the number of entities

  • num_relations (int) – the number of relations

  • kwargs – additional keyword-based parameters passed to TokenizationRepresentation.__init__

Returns:

A tokenization representation by applying the tokenizer

Return type:

TokenizationRepresentation

iter_extra_repr() Iterable[str][source]

Iterate over components for extra_repr().

Return type:

Iterable[str]

save_assignment(output_path: Path)[source]

Save the assignment to a file.

Parameters:

output_path (Path) – the output file path. Its parent directories will be created if necessary.