AnchorTokenizer

class AnchorTokenizer(selection: str | AnchorSelection | type[AnchorSelection] | None = None, selection_kwargs: Mapping[str, Any] | None = None, searcher: str | AnchorSearcher | type[AnchorSearcher] | None = None, searcher_kwargs: Mapping[str, Any] | None = None)[source]

Bases: Tokenizer

Tokenize entities by representing them as a bag of anchor entities.

The entities are chosen by shortest path distance.

Initialize the tokenizer.

Parameters:
  • selection (str | AnchorSelection | type[AnchorSelection] | None) – the anchor node selection strategy.

  • selection_kwargs (Mapping[str, Any] | None) – additional keyword-based arguments passed to the selection strategy

  • searcher (AnchorSearcher) – the component for searching the closest anchors for each entity

  • searcher_kwargs (Mapping[str, Any] | None) – additional keyword-based arguments passed to the searcher

Methods Summary

__call__(mapped_triples, num_tokens, ...)

Tokenize the entities contained given the triples.

Methods Documentation

__call__(mapped_triples: Tensor, num_tokens: int, num_entities: int, num_relations: int) tuple[int, Tensor][source]

Tokenize the entities contained given the triples.

Parameters:
  • mapped_triples (Tensor) – shape: (n, 3) the ID-based triples

  • num_tokens (int) – the number of tokens to select for each entity

  • num_entities (int) – the number of entities

  • num_relations (int) – the number of relations

Returns:

shape: (num_entities, num_tokens), -1 <= res < vocabulary_size the selected relation IDs for each entity. -1 is used as a padding token.

Return type:

tuple[int, Tensor]