TokenizationRepresentation
- class TokenizationRepresentation(assignment: Tensor, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, shape: int | Sequence[int] | None = None, **kwargs)[source]
Bases:
Representation
A module holding the result of tokenization.
It represents each index by the concatenation of representations of the corresponding tokens.
\[[T[t] | t \in \textit{tok}(i)]\]where \(tok(i)\) denotes the sequence of token indices for the given index \(i\), and \(T\) stores the representations for each token.
Initialize the tokenization.
- Parameters:
assignment (Tensor) – shape:
(n, num_chosen_tokens)
The token assignment.token_representation (str | Representation | type[Representation] | None) – shape:
(num_total_tokens, *shape)
The token representations.token_representation_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters.
shape (tuple[int, ...]) – The shape of an individual representation. If provided, has to match
(assignment.shape[1], *token_representation.shape)
.kwargs – Additional keyword-based parameters passed to
Representation
.
- Raises:
ValueError – If there’s a mismatch between the representation size and the vocabulary size.
Note
The parameter pair
(token_representation, token_representation_kwargs)
is used forpykeen.nn.representation_resolver
An explanation of resolvers and how to use them is given in https://class-resolver.readthedocs.io/en/latest/.
Attributes Summary
Return the number of selected tokens for each index.
Methods Summary
from_tokenizer
(tokenizer, num_tokens, ...[, ...])Create a tokenization from applying a tokenizer.
Iterate over components for
extra_repr()
.save_assignment
(output_path)Save the assignment to a file.
Attributes Documentation
- num_tokens
Return the number of selected tokens for each index.
Methods Documentation
- classmethod from_tokenizer(tokenizer: Tokenizer, num_tokens: int, mapped_triples: Tensor, num_entities: int, num_relations: int, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, **kwargs) Self [source]
Create a tokenization from applying a tokenizer.
- Parameters:
tokenizer (Tokenizer) – The tokenizer instance.
num_tokens (int) – The number of tokens to select for each entity.
token_representation (str | Representation | type[Representation] | None) – shape:
(num_total_tokens, *shape)
The token representations.token_representation_kwargs (Mapping[str, Any] | None) – Additional keyword-based parameters.
mapped_triples (Tensor) – The ID-based triples.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
kwargs – Additional keyword-based parameters passed to
TokenizationRepresentation
.
- Returns:
A
TokenizationRepresentation
by applying the tokenizer.- Return type: