NodePieceRepresentation¶

class NodePieceRepresentation(*, triples_factory, token_representations=None, token_representations_kwargs=None, tokenizers=None, tokenizers_kwargs=None, num_tokens=2, aggregation=None, max_id=None, **kwargs)[source]

Basic implementation of node piece decomposition [galkin2021].

$x_e = agg(\{T[t] \mid t \in tokens(e) \})$

where $$T$$ are token representations, $$tokens$$ selects a fixed number of $$k$$ tokens for each entity, and $$agg$$ is an aggregation function, which aggregates the individual token representations to a single entity representation.

Initialize the representation.

Parameters:

Methods Summary

 Estimate the diversity of the tokens via their hashes.

Methods Documentation

estimate_diversity()[source]

Estimate the diversity of the tokens via their hashes.

Return type:

HashDiversityInfo

Returns:

A ratio information tuple

Tokenization strategies might produce exactly the same hashes for several nodes depending on the graph structure and tokenization parameters. Same hashes will result in same node representations and, hence, might inhibit the downstream performance. This function comes handy when you need to estimate the diversity of built node hashes under a certain tokenization strategy - ideally, you’d want every node to have a unique hash. The function computes how many node hashes are unique in each representation and overall (if we concat all of them in a single row). 1.0 means that all nodes have unique hashes.

Example usage:

from pykeen.model import NodePiece

model = NodePiece(
triples_factory=dataset.training,
tokenizers=["AnchorTokenizer", "RelationTokenizer"],
num_tokens=[20, 12],
embedding_dim=64,
interaction="rotate",
relation_constrainer="complex_normalize",
entity_initializer="xavier_uniform_",
)
print(model.entity_representations[0].estimate_diversity())