class PrecomputedPoolTokenizer(*, path=None, url=None, download_kwargs=None, pool=None, randomize_selection=False, loader=None)[source]

Bases: Tokenizer

A tokenizer using externally precomputed tokenization.

Initialize the tokenizer.


the preference order for loading the precomputed pools is (1) from the given pool (2) from the given path, and (3) by downloading from the given url


ValueError – If the pool’s keys are not contiguous on \(0 \dots N-1\).

Methods Summary

__call__(mapped_triples, num_tokens, ...)

Tokenize the entities contained given the triples.

Methods Documentation

__call__(mapped_triples, num_tokens, num_entities, num_relations)[source]

Tokenize the entities contained given the triples.

  • mapped_triples (LongTensor) – shape: (n, 3) the ID-based triples

  • num_tokens (int) – the number of tokens to select for each entity

  • num_entities (int) – the number of entities

  • num_relations (int) – the number of relations

Return type:

Tuple[int, LongTensor]


shape: (num_entities, num_tokens), -1 <= res < vocabulary_size the selected relation IDs for each entity. -1 is used as a padding token.