MetisAnchorTokenizer

class MetisAnchorTokenizer(num_partitions: int = 2, device: str | device | None = None, **kwargs)[source]

Bases: AnchorTokenizer

An anchor tokenizer, which first partitions the graph using METIS.

We use the binding by torch_sparse. The METIS graph partitioning algorithm is described here: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

Initialize the tokenizer.

Parameters:
  • num_partitions (int) – the number of partitions obtained through Metis.

  • device (str | device | None) – the device to use for tokenization

  • kwargs – additional keyword-based parameters passed to AnchorTokenizer.__init__(). note that there will be one anchor tokenizer per partition, i.e., the vocabulary size will grow respectively.

Methods Summary

__call__(mapped_triples, num_tokens, ...)

Tokenize the entities contained given the triples.

Methods Documentation

__call__(mapped_triples: Tensor, num_tokens: int, num_entities: int, num_relations: int) tuple[int, Tensor][source]

Tokenize the entities contained given the triples.

Parameters:
  • mapped_triples (Tensor) – shape: (n, 3) the ID-based triples

  • num_tokens (int) – the number of tokens to select for each entity

  • num_entities (int) – the number of entities

  • num_relations (int) – the number of relations

Returns:

shape: (num_entities, num_tokens), -1 <= res < vocabulary_size the selected relation IDs for each entity. -1 is used as a padding token.

Return type:

tuple[int, Tensor]