NodePieceRepresentation
- class NodePieceRepresentation(*, triples_factory, token_representations=None, token_representations_kwargs=None, tokenizers=None, tokenizers_kwargs=None, num_tokens=2, aggregation=None, max_id=None, shape=None, **kwargs)[source]
Bases:
pykeen.nn.representation.RepresentationBasic implementation of node piece decomposition [galkin2021].
\[x_e = agg(\{T[t] \mid t \in tokens(e) \})\]where \(T\) are token representations, \(tokens\) selects a fixed number of \(k\) tokens for each entity, and \(agg\) is an aggregation function, which aggregates the individual token representations to a single entity representation.
Note
This implementation currently only supports representation of entities by bag-of-relations.
Initialize the representation.
- Parameters
triples_factory (
CoreTriplesFactory) – the triples factorytoken_representations (
Union[str,Representation,Type[Representation],None,Sequence[Union[str,Representation,Type[Representation],None]]]) – the token representation specification, or pre-instantiated representation module.token_representations_kwargs (
Union[Mapping[str,Any],None,Sequence[Optional[Mapping[str,Any]]]]) – additional keyword-based parameterstokenizers (
Union[str,Tokenizer,Type[Tokenizer],None,Sequence[Union[str,Tokenizer,Type[Tokenizer],None]]]) – the tokenizer to use, cf. pykeen.nn.node_piece.tokenizer_resolver.tokenizers_kwargs (
Union[Mapping[str,Any],None,Sequence[Optional[Mapping[str,Any]]]]) – additional keyword-based parameters passed to the tokenizer upon construction.num_tokens (
Union[int,Sequence[int]]) – the number of tokens for each entity.aggregation (
Union[None,str,Callable[[FloatTensor,int],FloatTensor]]) –aggregation of multiple token representations to a single entity representation. By default, this uses
torch.mean(). If a string is provided, the module assumes that this refers to a top-level torch function, e.g. “mean” fortorch.mean(), or “sum” for func:torch.sum. An aggregation can also have trainable parameters, .e.g.,MLP(mean(MLP(tokens)))(cf. DeepSets from [zaheer2017]). In this case, the module has to be created outside of this component.We could also have aggregations which result in differently shapes output, e.g. a concatenation of all token embeddings resulting in shape
(num_tokens * d,). In this case, shape must be provided.The aggregation takes two arguments: the (batched) tensor of token representations, in shape
(*, num_tokens, *dt), and the index along which to aggregate.shape (
Optional[Sequence[int]]) – the shape of an individual representation. Only necessary, if aggregation results in a change of dimensions. this will only be necessary if the aggregation is an ad hoc function.max_id (
Optional[int]) – Only pass this to check if the number of entities in the triples factories is the samekwargs – additional keyword-based parameters passed to super.__init__
- Raises
ValueError – if the shapes for any vocabulary entry in all token representations are inconsistent
Methods Summary
Set the extra representation of the module
Methods Documentation