BloomFilterer¶
- class BloomFilterer(mapped_triples, error_rate=0.001)[source]¶
Bases:
pykeen.sampling.filtering.Filterer
A filterer for negative triples based on the Bloom filter.
Pure PyTorch, a proper module which can be moved to GPU, and support batch-wise computation.
See also
https://github.com/hiway/python-bloom-filter/ - for calculation of sizes, and rough structure of code
https://github.com/skeeto/hash-prospector#two-round-functions - for parts of the hash function
Initialize the Bloom filter based filterer.
- Parameters
mapped_triples (
LongTensor
) – The ID-based triples.error_rate (
float
) – The desired error rate.
Methods Summary
add
(triples)Add triples to the Bloom filter.
contains
(batch)Check whether a triple is contained.
num_bits
(num[, error_rate])Determine the required number of bits.
num_probes
(num_elements, num_bits)Determine the number of probes / hashing rounds.
probe
(batch)Iterate over indices from the probes.
Methods Documentation
- contains(batch)[source]¶
Check whether a triple is contained.
- Parameters
batch (
LongTensor
) – shape (batch_size, 3) The batch of triples.- Return type
BoolTensor
- Returns
shape: (batch_size,) The result. False guarantees that the element was not contained in the indexed triples. True can be erroneous.