NodePieceRepresentation

class NodePieceRepresentation(*, triples_factory, token_representations=None, token_representations_kwargs=None, tokenizers=None, tokenizers_kwargs=None, num_tokens=2, aggregation=None, max_id=None, shape=None, **kwargs)[source]

Bases: pykeen.nn.representation.Representation

Basic implementation of node piece decomposition [galkin2021].

\[x_e = agg(\{T[t] \mid t \in tokens(e) \})\]

where \(T\) are token representations, \(tokens\) selects a fixed number of \(k\) tokens for each entity, and \(agg\) is an aggregation function, which aggregates the individual token representations to a single entity representation.

Note

This implementation currently only supports representation of entities by bag-of-relations.

Initialize the representation.

Parameters

triples_factory (CoreTriplesFactory) – the triples factory
token_representations (Union[str, Representation, Type[Representation], None, Sequence[Union[str, Representation, Type[Representation], None]]]) – the token representation specification, or pre-instantiated representation module.
token_representations_kwargs (Union[Mapping[str, Any], None, Sequence[Optional[Mapping[str, Any]]]]) – additional keyword-based parameters
tokenizers (Union[str, Tokenizer, Type[Tokenizer], None, Sequence[Union[str, Tokenizer, Type[Tokenizer], None]]]) – the tokenizer to use, cf. pykeen.nn.node_piece.tokenizer_resolver.
tokenizers_kwargs (Union[Mapping[str, Any], None, Sequence[Optional[Mapping[str, Any]]]]) – additional keyword-based parameters passed to the tokenizer upon construction.
num_tokens (Union[int, Sequence[int]]) – the number of tokens for each entity.
aggregation (Union[None, str, Callable[[FloatTensor, int], FloatTensor]]) –
aggregation of multiple token representations to a single entity representation. By default, this uses torch.mean(). If a string is provided, the module assumes that this refers to a top-level torch function, e.g. “mean” for torch.mean(), or “sum” for func:torch.sum. An aggregation can also have trainable parameters, .e.g., MLP(mean(MLP(tokens))) (cf. DeepSets from [zaheer2017]). In this case, the module has to be created outside of this component.

We could also have aggregations which result in differently shapes output, e.g. a concatenation of all token embeddings resulting in shape (num_tokens * d,). In this case, shape must be provided.

The aggregation takes two arguments: the (batched) tensor of token representations, in shape (*, num_tokens, *dt), and the index along which to aggregate.
shape (Optional[Sequence[int]]) – the shape of an individual representation. Only necessary, if aggregation results in a change of dimensions. this will only be necessary if the aggregation is an ad hoc function.
max_id (Optional[int]) – Only pass this to check if the number of entities in the triples factories is the same
kwargs – additional keyword-based parameters passed to super.__init__

Raises

ValueError – if the shapes for any vocabulary entry in all token representations are inconsistent

Methods Summary

extra_repr()

Set the extra representation of the module

Methods Documentation

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type: str