TokenizationRepresentation

class TokenizationRepresentation(assignment, token_representation=None, token_representation_kwargs=None, **kwargs)[source]

Bases: pykeen.nn.representation.Representation

A module holding the result of tokenization.

Initialize the tokenization.

Parameters
  • assignment (LongTensor) – shape: (n, num_chosen_tokens) the token assignment.

  • token_representation (Union[str, Representation, Type[Representation], None]) – shape: (num_total_tokens, *shape) the token representations

  • token_representation_kwargs (Optional[Mapping[str, Any]]) – additional keyword-based parameters

  • kwargs – additional keyword-based parameters passed to super.__init__

Raises

ValueError – if there’s a mismatch between the representation size and the vocabulary size

Methods Summary

extra_repr()

Set the extra representation of the module

from_tokenizer(tokenizer, num_tokens, ...[, ...])

Create a tokenization from applying a tokenizer.

Methods Documentation

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type

str

classmethod from_tokenizer(tokenizer, num_tokens, mapped_triples, num_entities, num_relations, token_representation=None, token_representation_kwargs=None, **kwargs)[source]

Create a tokenization from applying a tokenizer.

Parameters
  • tokenizer (Tokenizer) – the tokenizer instance.

  • num_tokens (int) – the number of tokens to select for each entity.

  • token_representation (Union[str, Representation, Type[Representation], None]) – the pre-instantiated token representations, or an EmbeddingSpecification to create them

  • token_representation_kwargs (Optional[Mapping[str, Any]]) – additional keyword-based parameters

  • mapped_triples (LongTensor) – the ID-based triples

  • num_entities (int) – the number of entities

  • num_relations (int) – the number of relations

  • kwargs – additional keyword-based parameters passed to TokenizationRepresentation.__init__

Return type

TokenizationRepresentation

Returns

A tokenization representation by applying the tokenizer