KG2E¶

class KG2E(triples_factory, embedding_dim=50, loss=None, preferred_device=None, random_seed=None, dist_similarity=None, c_min=0.05, c_max=5.0, regularizer=None, entity_initializer=<function uniform_>, entity_constrainer=<function clamp_norm>, entity_constrainer_kwargs=None, relation_initializer=<function uniform_>, relation_constrainer=<function clamp_norm>, relation_constrainer_kwargs=None)[source]¶

Bases: pykeen.models.base.EntityRelationEmbeddingModel

An implementation of KG2E from [he2015].

KG2E aims to explicitly model (un)certainties in entities and relations (e.g. influenced by the number of triples observed for these entities and relations). Therefore, entities and relations are represented by probability distributions, in particular by multi-variate Gaussian distributions \(\mathcal{N}_i(\mu_i,\Sigma_i)\) where the mean \(\mu_i \in \mathbb{R}^d\) denotes the position in the vector space and the diagonal variance \(\Sigma_i \in \mathbb{R}^{d \times d}\) models the uncertainty. Inspired by the pykeen.models.TransE model, relations are modeled as transformations from head to tail entities: \(\mathcal{H} - \mathcal{T} \approx \mathcal{R}\) where \(\mathcal{H} \sim \mathcal{N}_h(\mu_h,\Sigma_h)\), \(\mathcal{H} \sim \mathcal{N}_t(\mu_t,\Sigma_t)\), \(\mathcal{R} \sim \mathcal{P}_r = \mathcal{N}_r(\mu_r,\Sigma_r)\), and \(\mathcal{H} - \mathcal{T} \sim \mathcal{P}_e = \mathcal{N}_{h-t}(\mu_h - \mu_t,\Sigma_h + \Sigma_t)\) (since head and tail entities are considered to be independent with regards to the relations). The interaction model measures the similarity between \(\mathcal{P}_e\) and \(\mathcal{P}_r\) by means of the Kullback-Liebler Divergence (KG2E.kullback_leibler_similarity()).

\[f(h,r,t) = \mathcal{D_{KL}}(\mathcal{P}_e, \mathcal{P}_r)\]

Besides the asymmetric KL divergence, the authors propose a symmetric variant which uses the expected likelihood (KG2E.expected_likelihood())

\[f(h,r,t) = \mathcal{D_{EL}}(\mathcal{P}_e, \mathcal{P}_r)\]

Initialize KG2E.

Parameters

embedding_dim (int) – The entity embedding dimension \(d\). Is usually \(d \in [50, 350]\).
dist_similarity (Optional[str]) – Either ‘KL’ for Kullback-Leibler or ‘EL’ for expected likelihood. Defaults to KL.
c_min (float) –
c_max (float) –

Attributes Summary

`constrainer_default_kwargs`	The default settings for the entity constrainer
`hpo_default`	The default strategy for optimizing the model’s hyper-parameters

Methods Summary

`expected_likelihood`(mu_e, mu_r, sigma_e, sigma_r)	Compute the similarity based on expected likelihood.
`kullback_leibler_similarity`(mu_e, mu_r, …)	Compute the similarity based on KL divergence.
`post_parameter_update`()	Has to be called after each parameter update.
`score_h`(rt_batch)	Forward pass using left side (head) prediction.
`score_hrt`(hrt_batch)	Forward pass.
`score_t`(hr_batch)	Forward pass using right side (tail) prediction.

Attributes Documentation

constrainer_default_kwargs = {'dim': -1, 'maxnorm': 1.0, 'p': 2}¶: The default settings for the entity constrainer

hpo_default: ClassVar[Mapping[str, Any]] = {'c_max': {'high': 10.0, 'low': 1.0, 'type': <class 'float'>}, 'c_min': {'high': 0.1, 'low': 0.01, 'scale': 'log', 'type': <class 'float'>}, 'embedding_dim': {'high': 256, 'low': 16, 'q': 16, 'type': <class 'int'>}}¶: The default strategy for optimizing the model’s hyper-parameters

Methods Documentation

static expected_likelihood(mu_e, mu_r, sigma_e, sigma_r, epsilon=1e-10)[source]¶

Compute the similarity based on expected likelihood.

\[D((\mu_e, \Sigma_e), (\mu_r, \Sigma_r))) = \frac{1}{2} \left( (\mu_e - \mu_r)^T(\Sigma_e + \Sigma_r)^{-1}(\mu_e - \mu_r) + \log \det (\Sigma_e + \Sigma_r) + d \log (2 \pi) \right) = \frac{1}{2} \left( \mu^T\Sigma^{-1}\mu + \log \det \Sigma + d \log (2 \pi) \right)\]

Parameters

mu_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the first Gaussian.
mu_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the second Gaussian.
sigma_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the first Gaussian.
sigma_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the second Gaussian.
epsilon (float) – float (default=1.0) Small constant used to avoid numerical issues when dividing.

Return type

FloatTensor

Returns

torch.Tensor, shape: (s_1, …, s_k) The similarity.

static kullback_leibler_similarity(mu_e, mu_r, sigma_e, sigma_r, epsilon=1e-10)[source]¶

Compute the similarity based on KL divergence.

This is done between two Gaussian distributions given by mean mu_* and diagonal covariance matrix sigma_*.

\[D((\mu_e, \Sigma_e), (\mu_r, \Sigma_r))) = \frac{1}{2} \left( tr(\Sigma_r^{-1}\Sigma_e) + (\mu_r - \mu_e)^T\Sigma_r^{-1}(\mu_r - \mu_e) - \log \frac{det(\Sigma_e)}{det(\Sigma_r)} - k_e \right)\]

Note: The sign of the function has been flipped as opposed to the description in the paper, as the: Kullback Leibler divergence is large if the distributions are dissimilar.

Parameters

mu_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the first Gaussian.
mu_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The mean of the second Gaussian.
sigma_e (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the first Gaussian.
sigma_r (FloatTensor) – torch.Tensor, shape: (s_1, …, s_k, d) The diagonal covariance matrix of the second Gaussian.
epsilon (float) – float (default=1.0) Small constant used to avoid numerical issues when dividing.

Return type

FloatTensor

Returns

torch.Tensor, shape: (s_1, …, s_k) The similarity.

post_parameter_update()[source]¶

Has to be called after each parameter update.

Return type: None

score_h(rt_batch)[source]¶

Forward pass using left side (head) prediction.

This method calculates the score for all possible heads for each (relation, tail) pair.

Parameters: rt_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (relation, tail) pairs.
Return type: FloatTensor
Returns: shape: (batch_size, num_entities), dtype: float For each r-t pair, the scores for all possible heads.

score_hrt(hrt_batch)[source]¶

Forward pass.

This method takes head, relation and tail of each triple and calculates the corresponding score.

Parameters: hrt_batch (LongTensor) – shape: (batch_size, 3), dtype: long The indices of (head, relation, tail) triples.
Raises: NotImplementedError – If the method was not implemented for this class.
Return type: FloatTensor
Returns: shape: (batch_size, 1), dtype: float The score for each triple.

score_t(hr_batch)[source]¶

Forward pass using right side (tail) prediction.

This method calculates the score for all possible tails for each (head, relation) pair.

Parameters: hr_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (head, relation) pairs.
Return type: FloatTensor
Returns: shape: (batch_size, num_entities), dtype: float For each h-r pair, the scores for all possible tails.