TokenizationRepresentation

A module holding the result of tokenization.

It represents each index by the concatenation of representations of the corresponding tokens.

\[[T[t] | t \in \textit{tok}(i)]\]

where \(tok(i)\) denotes the sequence of token indices for the given index \(i\), and \(T\) stores the representations for each token.

Initialize the tokenization.

Parameters:

assignment (Tensor) – shape: (n, num_chosen_tokens) The token assignment.
token_representation (str | Representation | type[Representation] | None) – shape: (num_total_tokens, *shape) The token representations.
token_representation_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters.
shape (tuple[int, ...]) – The shape of an individual representation. If provided, has to match (assignment.shape[1], *token_representation.shape).
kwargs – Additional keyword-based parameters passed to Representation.

Raises:

ValueError – If there’s a mismatch between the representation size and the vocabulary size.

Note

The parameter pair (token_representation, token_representation_kwargs) is used for pykeen.nn.representation_resolver

An explanation of resolvers and how to use them is given in https://class-resolver.readthedocs.io/en/latest/.

Attributes Summary

Return the number of selected tokens for each index.

Methods Summary

`from_tokenizer`(tokenizer, num_tokens, ...[, ...])	Create a tokenization from applying a tokenizer.
`iter_extra_repr`()	Iterate over components for `extra_repr()`.
`save_assignment`(output_path)	Save the assignment to a file.

Attributes Documentation

Methods Documentation

classmethod from_tokenizer(tokenizer: Tokenizer, num_tokens: int, mapped_triples: Tensor, num_entities: int, num_relations: int, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, **kwargs) → Self[source]

Create a tokenization from applying a tokenizer.

Parameters:

tokenizer (Tokenizer) – The tokenizer instance.
num_tokens (int) – The number of tokens to select for each entity.
token_representation (str | Representation | type[Representation] | None) – shape: (num_total_tokens, *shape) The token representations.
token_representation_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters.
mapped_triples (Tensor) – The ID-based triples.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
kwargs – Additional keyword-based parameters passed to TokenizationRepresentation.

Returns:

A TokenizationRepresentation by applying the tokenizer.

Return type:

Self

iter_extra_repr() → Iterable[str][source]

Iterate over components for extra_repr().

save_assignment(output_path: Path)[source]

Save the assignment to a file.

Parameters:: output_path (Path) – The output file path. Its parent directories will be created if necessary.