NodePieceRepresentation
- class NodePieceRepresentation(*, triples_factory, token_representations=None, token_representations_kwargs=None, tokenizers=None, tokenizers_kwargs=None, num_tokens=2, aggregation=None, max_id=None, shape=None, **kwargs)[source]
Bases:
pykeen.nn.representation.Representation
Basic implementation of node piece decomposition [galkin2021].
\[x_e = agg(\{T[t] \mid t \in tokens(e) \})\]where \(T\) are token representations, \(tokens\) selects a fixed number of \(k\) tokens for each entity, and \(agg\) is an aggregation function, which aggregates the individual token representations to a single entity representation.
Note
This implementation currently only supports representation of entities by bag-of-relations.
Initialize the representation.
- Parameters
triples_factory (
CoreTriplesFactory
) – the triples factorytoken_representations (
Union
[str
,Representation
,Type
[Representation
],None
,Sequence
[Union
[str
,Representation
,Type
[Representation
],None
]]]) – the token representation specification, or pre-instantiated representation module.token_representations_kwargs (
Union
[Mapping
[str
,Any
],None
,Sequence
[Optional
[Mapping
[str
,Any
]]]]) – additional keyword-based parameterstokenizers (
Union
[str
,Tokenizer
,Type
[Tokenizer
],None
,Sequence
[Union
[str
,Tokenizer
,Type
[Tokenizer
],None
]]]) – the tokenizer to use, cf. pykeen.nn.node_piece.tokenizer_resolver.tokenizers_kwargs (
Union
[Mapping
[str
,Any
],None
,Sequence
[Optional
[Mapping
[str
,Any
]]]]) – additional keyword-based parameters passed to the tokenizer upon construction.num_tokens (
Union
[int
,Sequence
[int
]]) – the number of tokens for each entity.aggregation (
Union
[None
,str
,Callable
[[FloatTensor
,int
],FloatTensor
]]) –aggregation of multiple token representations to a single entity representation. By default, this uses
torch.mean()
. If a string is provided, the module assumes that this refers to a top-level torch function, e.g. “mean” fortorch.mean()
, or “sum” for func:torch.sum. An aggregation can also have trainable parameters, .e.g.,MLP(mean(MLP(tokens)))
(cf. DeepSets from [zaheer2017]). In this case, the module has to be created outside of this component.We could also have aggregations which result in differently shapes output, e.g. a concatenation of all token embeddings resulting in shape
(num_tokens * d,)
. In this case, shape must be provided.The aggregation takes two arguments: the (batched) tensor of token representations, in shape
(*, num_tokens, *dt)
, and the index along which to aggregate.shape (
Optional
[Sequence
[int
]]) – the shape of an individual representation. Only necessary, if aggregation results in a change of dimensions. this will only be necessary if the aggregation is an ad hoc function.max_id (
Optional
[int
]) – Only pass this to check if the number of entities in the triples factories is the samekwargs – additional keyword-based parameters passed to super.__init__
- Raises
ValueError – if the shapes for any vocabulary entry in all token representations are inconsistent
Methods Summary
Set the extra representation of the module
Methods Documentation