Negative Sampling

For entities $$\mathcal{E}$$ and relations $$\mathcal{R}$$, the set of all possible triples $$\mathcal{T}$$ is constructed through their cartesian product $$\mathcal{T} = \mathcal{E} \times \mathcal{R} \times \mathcal{E}$$. A given knowledge graph $$\mathcal{K}$$ is a subset of all possible triples $$\mathcal{K} \subseteq \mathcal{T}$$.

Construction of Knowledge Graphs

When constructing a knowledge graph $$\mathcal{K}_{\text{closed}}$$ under the closed world assumption, the labels of the remaining triples $$(h,r,t) \in \mathcal{T} \setminus \mathcal{K}_{\text{closed}}$$ are defined as negative. When constructing a knowledge graph $$\mathcal{K}_{\text{open}}$$ under the open world assumption, the labels of the remaining triples $$(h,r,t) \in \mathcal{T} \setminus \mathcal{K}_{\text{open}}$$ are unknown.

Becuase most knowledge graphs are generated under the open world assumption, negative sampling techniques must be employed during the training of knowledge graph embedding models to avoid over-generalization.

Corruption

Negative sampling techniques often generate negative triples by corrupting a known positive triple $$(h,r,t) \in \mathcal{K}$$ by replacing either $$h$$, $$r$$, or $$t$$ with one of the following operations:

 Corrupt heads $$\mathcal{H}(h, r, t) = \{(h', r, t) \mid h' \in \mathcal{E} \land h' \neq h\}$$ Corrupt relations $$\mathcal{R}(h, r, t) = \{(h, r', t) \mid r' \in \mathcal{E} \land r' \neq r\}$$ Corrupt tails $$\mathcal{T}(h, r, t) = \{(h, r, t') \mid t' \in \mathcal{E} \land t' \neq t\}$$

Typically, the corrupt relations operation $$\mathcal{R}(h, r, t)$$ is omitted becuase the evaluation of knowledge graph embedding models on the link prediction task only consideres the goodness of head prediction and tail prediction, but not relation prediction. Therefore, the set of candidate negative triples $$\mathcal{N}(h, r, t)$$ for a given known positive triple $$(h,r,t) \in \mathcal{K}$$ is given by:

$\mathcal{N}(h, r, t) = \mathcal{T}(h, r, t) \cup \mathcal{H}(h, r, t)$

Generally, the set of potential negative triples $$\mathcal{N}$$ over all positive triples $$(h,r,t) \in \mathcal{K}$$ is defined as:

$\mathcal{N} = \bigcup_{(h,r,t) \in \mathcal{K}} \mathcal{N}(h, r, t)$

Uniform Negative Sampling

The default negative sampler pykeen.sampling.BasicNegativeSampler generates corrupted triples from a known positive triple $$(h,r,t) \in \mathcal{K}$$ by uniformly randomly either using the corrupt heads operation or the corrupt tails operation. The default negative sampler is automatically used in the following code:

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
)


It can be set explicitly with:

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='basic',
)


In general, the behavior of the negative sampler can be modified when using the pykeen.pipeline.pipeline() by passing the negative_sampler_kwargs argument. In order to explicitly specifiy which of the head, relation, and tail corruption methods are used, the corruption_schema argument can be used. For example, to use all three, the collection ('h', 'r', 't') can be passed as in the following:

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='basic',
negative_sampler_kwargs=dict(
corruption_scheme=('h', 'r', 't'),
),
)


Bernoulli Negative Sampling

The Bernoulli negative sampler pykeen.sampling.BernoulliNegativeSampler generates corrupted triples from a known positive triple $$(h,r,t) \in \mathcal{K}$$ similarly to the uniform negative sampler, but it pre-computes a probability $$p_r$$ for each relation $$r$$ to weight whether the head corruption is used with probability $$p_r$$ or if tail corruption is used with probability $$1 - p_r$$.

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='bernoulli',
)


Classes

 NegativeSampler(*, mapped_triples[, ...]) A negative sampler. BasicNegativeSampler(*[, corruption_scheme]) A basic negative sampler. BernoulliNegativeSampler(*, mapped_triples, ...) An implementation of the Bernoulli negative sampling approach proposed by [wang2014]. A sampler that accounts for which entities co-occur with a relation.

Filtering

Consider the following properties of relation $$r$$. Because the corruption operations (see Corruption) are applied independently of triples, the resulting candidate corrupt triples could overlap with known positive triples in $$\mathcal{K}$$.

Property of $$r$$

Example pair of triples

Implications

one-to-many

$$(h,r,t_1), (h,r,t_2) \in \mathcal{K}$$

$$(h,r,t_2) \in T(h,r,t_1) \cup (h,r,t_1) \in T(h,r,t_2)$$

multiple

$$(h,r_1,t), (h,r_2,t) \in \mathcal{K}$$

$$(h,r_2,t) \in R(h,r_1,t) \cup (h,r_1,t) \in R(h,r_2,t)$$

many-to-one

$$(h_1,r,t), (h_2,r,t) \in \mathcal{K}$$

$$(h_2,r,t) \in H(h_1,r,t) \cup (h_1,r,t) \in H(h_2,r,t)$$

If no relations in $$\mathcal{K}$$ satisfy any of the relevant properties for the corruption schema chosen in negative sampling, then there is guaranteed to be no overlap between $$\mathcal{N}$$ and $$\mathcal{K}$$ such that $$\mathcal{N} \cap \mathcal{K} \neq \emptyset$$. However, this scenario is very unlikely for real-world knowledge graphs.

The known positive triples that appear in $$\mathcal{N}$$ are known false negatives. Hence, we know that these are incorrect (negative) training examples, and might want to exclude them to reduce the training noise.

Warning

It should be taken into account that also a corrupted triple that is not part of the knowledge graph can represent a true fact. These “unknown” false negatives can not be removed a priori in the filtered setting. The philosophy of the methodology again relies on the low number of unknown false negatives such that learning can take place.

However, in practice, $$|\mathcal{N}| \gg |\mathcal{K}|$$, so the likelihood of generating a false negative is rather low. Therefore, the additional filter step is often omitted to lower computational cost. This general observation might not hold for all entities; e.g., for a hub entity which is connected to many other entities, there may be a considerable number of false negatives without filtering.

Identifying False Negatives During Training

By default, PyKEEN does not filter false negatives from $$\mathcal{N}$$ during training. To enable filtering of negative examples during training, the filtered keyword can be given to negative_sampler_kwargs like in:

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='basic',
negative_sampler_kwargs=dict(
filtered=True,
),
)


PyKEEN implements several algorithms for filtering with different properties that can be chosen using the filterer keyword argument in negative_sampler_kwargs. By default, an fast and approximate algorithm is used in pykeen.sampling.filtering.BloomFilterer, which is based on bloom filters. The bloom filterer also has a configurable desired error rate, which can be further lowered at the cost of increase in memory and computation costs.

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='basic',
negative_sampler_kwargs=dict(
filtered=True,
filterer='bloom',
filterer_kwargs=dict(
error_rate=0.0001,
),
),
)


If you want to have a guarantee that all known false negatives are filtered, you can use a slower implementation based on Python’s built-in sets, the pykeen.sampling.filtering.PythonSetFilterer. It can be activated with:

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
training_loop='sLCWA',
negative_sampler='basic',
negative_sampler_kwargs=dict(
filtered=True,
filterer='python-set',
),
)


Identifying False Negatives During Evaluation

In contrast to training, PyKEEN does filter false negatives from $$\mathcal{N}$$ during evaluation by default. To disable the “filtered setting” during evaluation, the filtered keyword can be given to evaluator_kwargs like in:

from pykeen.pipeline import pipeline

results = pipeline(
dataset='YAGO3-10',
model='PairRE',
evaluator_kwargs=dict(
filtered=False,
),
)


Filtering during evaluation is implemented differently than in negative sampling:

First, there are no choices between an exact or approximate algorithm via a pykeen.sampling.filtering.Filterer. Instead, the evaluation filtering can modify the scores in-place and does so instead of selecting only the non-filtered entries. The reason is mainly that evaluation always is done in 1:n scoring, and thus, we gain some efficiently here by keeping the tensor in “dense” shape (batch_size, num_entities).

Second, filtering during evaluation has to be correct, and is crucial for reproducing results from the filtered setting. For evaluation it makes sense to use all information we have to get as solid evaluation results as possible.

Classes

 An interface for filtering methods for negative triples. BloomFilterer(mapped_triples[, error_rate]) A filterer for negative triples based on the Bloom filter. PythonSetFilterer(mapped_triples) A filterer using Python sets for filtering.