TokenizationRepresentation
- class TokenizationRepresentation(assignment: Tensor, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, shape: int | Sequence[int] | None = None, **kwargs)[source]
Bases:
Representation
A module holding the result of tokenization.
Initialize the tokenization.
- Parameters:
assignment (Tensor) – shape: (n, num_chosen_tokens) the token assignment.
token_representation (str | Representation | type[Representation] | None) – shape: (num_total_tokens, *shape) the token representations
token_representation_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters
shape (tuple[int, ...]) – The shape of an individual representation. If provided, has to match.
kwargs – additional keyword-based parameters passed to
Representation.__init__()
- Raises:
ValueError – if there’s a mismatch between the representation size and the vocabulary size
Attributes Summary
Return the number of selected tokens for ID.
Methods Summary
from_tokenizer
(tokenizer, num_tokens, ...[, ...])Create a tokenization from applying a tokenizer.
Iterate over components for
extra_repr()
.save_assignment
(output_path)Save the assignment to a file.
Attributes Documentation
- num_tokens
Return the number of selected tokens for ID.
Methods Documentation
- classmethod from_tokenizer(tokenizer: Tokenizer, num_tokens: int, mapped_triples: Tensor, num_entities: int, num_relations: int, token_representation: str | Representation | type[Representation] | None = None, token_representation_kwargs: Mapping[str, Any] | None = None, **kwargs) TokenizationRepresentation [source]
Create a tokenization from applying a tokenizer.
- Parameters:
tokenizer (Tokenizer) – the tokenizer instance.
num_tokens (int) – the number of tokens to select for each entity.
token_representation (str | Representation | type[Representation] | None) – the pre-instantiated token representations, class, or name of a class
token_representation_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters
mapped_triples (Tensor) – the ID-based triples
num_entities (int) – the number of entities
num_relations (int) – the number of relations
kwargs – additional keyword-based parameters passed to TokenizationRepresentation.__init__
- Returns:
A tokenization representation by applying the tokenizer
- Return type: