PartitionRepresentation

class PartitionRepresentation(assignment, shape=None, bases=None, bases_kwargs=None, **kwargs)[source]

Bases: Representation

A partition of the indices into different representation modules.

Each index is assigned to an index in exactly one of the base representations. This representation is useful, e.g., when one of the base representations cannot provide vectors for each of the indices, and another representation is used as back-up.

Consider the following example: We only have textual information for two entities. We want to use textual features computed from them, which should not be trained. For the remaining entities we want to use directly trainable embeddings.

We start by creating the representation for those entities where we have labels:

>>> from pykeen.nn import Embedding, init
>>> num_entities = 5
>>> labels = {1: "a first description", 4: "a second description"}
>>> label_initializer = init.LabelBasedInitializer(labels=list(labels.values()))
>>> label_repr = label_initializer.as_embedding()

Next, we create representations for the remaining ones

>>> non_label_repr = Embedding(max_id=num_entities - len(labels), shape=label_repr.shape)

To combine them into a single representation module we first need to define the assignment, i.e., where to look-up the global ids. For this, we create a tensor of shape (num_entities, 2), with the index of the base representation, and the local index inside this representation

>>> import torch
>>> assignment = torch.as_tensor([(1, 0), (0, 0), (1, 1), (1, 2), (0, 1)])
>>> from pykeen.nn import PartitionRepresentation
>>> entity_repr = PartitionRepresentation(assignment=assignment, bases=[label_repr, non_label_repr])

For brevity, we use here randomly generated triples factories instead of the actual data

>>> from pykeen.triples.generation import generate_triples_factory
>>> training = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=31)
>>> testing = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=17)

The combined representation can now be used as any other representation, e.g., to train a DistMult model:

>>> from pykeen.pipeline import pipeline
>>> from pykeen.models import ERModel
>>> pipeline(
...     model=ERModel,
...     interaction="distmult",
...     model_kwargs=dict(
...         entity_representation=entity_repr,
...         relation_representation_kwargs=dict(shape=shape),
...     ),
...     training=training,
...     testing=testing,
... )

Initialize the representation.

Warning

the base representations have to have coherent shapes

Parameters:
  • assignment (LongTensor) – shape: (max_id, 2) the assignment, as tuples (base_id, local_id), where base_id refers to the index of the base representation and local_id is an index used to lookup in the base representation

  • shape (Union[int, Sequence[int], None]) – the shape of an individual representation. If provided, must match the bases’ shape

  • bases (Union[str, Representation, Type[Representation], None, Sequence[Union[str, Representation, Type[Representation], None]]]) – the base representations, or hints thereof.

  • bases_kwargs (Union[Mapping[str, Any], None, Sequence[Optional[Mapping[str, Any]]]]) – keyword-based parameters to instantiate the base representations

  • kwargs – additional keyword-based parameters passed to Representation.__init__(). May not contain max_id, or shape, which are inferred from the base representations.

Raises:

ValueError – if any of the inputs is invalid