KG2E¶

class KG2E(*, embedding_dim=50, dist_similarity=None, c_min=0.05, c_max=5.0, entity_initializer=<function uniform_>, entity_constrainer=<function clamp_norm>, entity_constrainer_kwargs=None, relation_initializer=<function uniform_>, relation_constrainer=<function clamp_norm>, relation_constrainer_kwargs=None, **kwargs)[source]

An implementation of KG2E from [he2015].

KG2E aims to explicitly model (un)certainties in entities and relations (e.g. influenced by the number of triples observed for these entities and relations). Therefore, entities and relations are represented by probability distributions, in particular by multi-variate Gaussian distributions $$\mathcal{N}_i(\mu_i,\Sigma_i)$$ where the mean $$\mu_i \in \mathbb{R}^d$$ denotes the position in the vector space and the diagonal variance $$\Sigma_i \in \mathbb{R}^{d \times d}$$ models the uncertainty. Inspired by the pykeen.models.TransE model, relations are modeled as transformations from head to tail entities: $$\mathcal{H} - \mathcal{T} \approx \mathcal{R}$$ where $$\mathcal{H} \sim \mathcal{N}_h(\mu_h,\Sigma_h)$$, $$\mathcal{H} \sim \mathcal{N}_t(\mu_t,\Sigma_t)$$, $$\mathcal{R} \sim \mathcal{P}_r = \mathcal{N}_r(\mu_r,\Sigma_r)$$, and $$\mathcal{H} - \mathcal{T} \sim \mathcal{P}_e = \mathcal{N}_{h-t}(\mu_h - \mu_t,\Sigma_h + \Sigma_t)$$ (since head and tail entities are considered to be independent with regards to the relations). The interaction model measures the similarity between $$\mathcal{P}_e$$ and $$\mathcal{P}_r$$ by means of the Kullback-Liebler Divergence (KG2E.kullback_leibler_similarity()).

$f(h,r,t) = \mathcal{D_{KL}}(\mathcal{P}_e, \mathcal{P}_r)$

Besides the asymmetric KL divergence, the authors propose a symmetric variant which uses the expected likelihood (KG2E.expected_likelihood())

$f(h,r,t) = \mathcal{D_{EL}}(\mathcal{P}_e, \mathcal{P}_r)$

Initialize KG2E.

Parameters
Raises

ValueError – if an illegal dist_similarity is given

Attributes Summary

 constrainer_default_kwargs The default settings for the entity constrainer hpo_default The default strategy for optimizing the model’s hyper-parameters

Methods Summary

 expected_likelihood(mu_e, mu_r, sigma_e, sigma_r) Compute the similarity based on expected likelihood. kullback_leibler_similarity(mu_e, mu_r, …) Compute the similarity based on KL divergence. Has to be called after each parameter update. score_h(rt_batch) Forward pass using left side (head) prediction. score_hrt(hrt_batch) Forward pass. score_t(hr_batch) Forward pass using right side (tail) prediction.

Attributes Documentation

constrainer_default_kwargs = {'dim': -1, 'maxnorm': 1.0, 'p': 2}

The default settings for the entity constrainer

hpo_default: ClassVar[Mapping[str, Any]] = {'c_max': {'high': 10.0, 'low': 1.0, 'type': <class 'float'>}, 'c_min': {'high': 0.1, 'low': 0.01, 'scale': 'log', 'type': <class 'float'>}, 'embedding_dim': {'high': 256, 'low': 16, 'q': 16, 'type': <class 'int'>}}

The default strategy for optimizing the model’s hyper-parameters

Methods Documentation

static expected_likelihood(mu_e, mu_r, sigma_e, sigma_r, epsilon=1e-10)[source]

Compute the similarity based on expected likelihood.

$D((\mu_e, \Sigma_e), (\mu_r, \Sigma_r))) = \frac{1}{2} \left( (\mu_e - \mu_r)^T(\Sigma_e + \Sigma_r)^{-1}(\mu_e - \mu_r) + \log \det (\Sigma_e + \Sigma_r) + d \log (2 \pi) \right) = \frac{1}{2} \left( \mu^T\Sigma^{-1}\mu + \log \det \Sigma + d \log (2 \pi) \right)$
Parameters
• mu_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the first Gaussian.

• mu_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the second Gaussian.

• sigma_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the first Gaussian.

• sigma_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the second Gaussian.

• epsilon (float) – float (default=1.0) Small constant used to avoid numerical issues when dividing.

Return type

FloatTensor

Returns

torch.Tensor, shape: (s_1, …, s_k) The similarity.

static kullback_leibler_similarity(mu_e, mu_r, sigma_e, sigma_r, epsilon=1e-10)[source]

Compute the similarity based on KL divergence.

This is done between two Gaussian distributions given by mean mu_* and diagonal covariance matrix sigma_*.

$D((\mu_e, \Sigma_e), (\mu_r, \Sigma_r))) = \frac{1}{2} \left( tr(\Sigma_r^{-1}\Sigma_e) + (\mu_r - \mu_e)^T\Sigma_r^{-1}(\mu_r - \mu_e) - \log \frac{det(\Sigma_e)}{det(\Sigma_r)} - k_e \right)$
Note: The sign of the function has been flipped as opposed to the description in the paper, as the

Kullback Leibler divergence is large if the distributions are dissimilar.

Parameters
• mu_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the first Gaussian.

• mu_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the second Gaussian.

• sigma_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the first Gaussian.

• sigma_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the second Gaussian.

• epsilon (float) – float (default=1.0) Small constant used to avoid numerical issues when dividing.

Return type

FloatTensor

Returns

torch.Tensor, shape: (s_1, …, s_k) The similarity.

post_parameter_update()[source]

Has to be called after each parameter update.

Return type

None

score_h(rt_batch)[source]

Forward pass using left side (head) prediction.

This method calculates the score for all possible heads for each (relation, tail) pair.

Parameters

rt_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (relation, tail) pairs.

Return type

FloatTensor

Returns

shape: (batch_size, num_entities), dtype: float For each r-t pair, the scores for all possible heads.

score_hrt(hrt_batch)[source]

Forward pass.

This method takes head, relation and tail of each triple and calculates the corresponding score.

Parameters

hrt_batch (LongTensor) – shape: (batch_size, 3), dtype: long The indices of (head, relation, tail) triples.

Raises

NotImplementedError – If the method was not implemented for this class.

Return type

FloatTensor

Returns

shape: (batch_size, 1), dtype: float The score for each triple.

score_t(hr_batch)[source]

Forward pass using right side (tail) prediction.

This method calculates the score for all possible tails for each (head, relation) pair.

Parameters

hr_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (head, relation) pairs.

Return type

FloatTensor

Returns

shape: (batch_size, num_entities), dtype: float For each h-r pair, the scores for all possible tails.