Extending the Models
====================
You should first read the tutorial on bringing your own interaction module.
This tutorial is about how to wrap a custom interaction module with a model
module for general reuse and application.

Implementing a model by subclassing :class:`pykeen.models.ERModel`
------------------------------------------------------------------
The following code block demonstrates how an interaction model can be used to define a full
KGEM using the :class:`pykeen.models.ERModel` base class.


.. code-block:: python

    from pykeen.models import ERModel
    from pykeen.nn import Embedding, Interaction


    class DistMultInteraction(Interaction):
        def forward(self, h, r, t):
            return (h * r * t).sum(dim=-1)


    class DistMult(ERModel):
        def __init__(
            self,
            # When defining your class, any hyper-parameters that can be configured should be
            # made as arguments to the __init__() function. When running the pipeline(), these
            # are passed via the ``model_kwargs``.
            embedding_dim: int = 50,
            # All remaining arguments are simply passed through to the parent constructor. If you
            # want access to them, you can name them explicitly. See the pykeen.models.ERModel
            # documentation for a full list
            **kwargs,
        ) -> None:
            # since this is a python class, you can feel free to get creative here. One example of
            # pre-processing is to derive the shape for the relation representation based on the
            # embedding dimension.
            super().__init__(
                # Pass an instance of your interaction function. This is also a place where you can
                # pass hyper-parameters, such as the L_p norm, from the KGEM to the interaction function
                interaction=DistMultInteraction,
                # interaction_kwargs=dict(...),
                # Define the entity representations using a dict. By default, each
                # embedding is a vector. You can use the ``shape`` kwarg to specify higher dimensional
                # tensor shapes.
                entity_representations=Embedding,
                entity_representations_kwargs=dict(
                    embedding_dim=embedding_dim,
                ),
                # Define the relation representations the same as the entities
                relation_representations=Embedding,
                relation_representations_kwargs=dict(
                    embedding_dim=embedding_dim,
                ),
                # All other arguments are passed through, such as the ``triples_factory``, ``loss``,
                # ``preferred_device``, and others. These are all handled by the pipeline() function
                **kwargs,
            )

The actual implementation of DistMult can be found in :class:`pykeen.models.DistMult`. Note that
it additionally contains configuration for the initializers, constrainers, and regularizers
for each of the embeddings as well as class-level defaults for hyper-parameters and hyper-parameter
optimization. Modifying these is covered in other tutorials.

Specifying Defaults
~~~~~~~~~~~~~~~~~~~~
If you have a preferred loss function for your model, you can add the ``loss_default`` class variable
where the value is the loss class.

.. code-block:: python

    from typing import ClassVar

    from pykeen.models import ERModel
    from pykeen.losses import Loss, NSSALoss

    class DistMult(ERModel):
        loss_default: ClassVar[Type[Loss]] = NSSALoss
        ...

Now, when using the pipeline, the :class:`pykeen.losses.NSSALoss`. loss is used by default
if none is given. The same kind of modifications can be made to set a default regularizer
with ``regularizer_default``.

Specifying Hyper-parameter Optimization Default Ranges
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All subclasses of :class:`pykeen.models.Model` can specify the default
ranges or values used during hyper-parameter optimization (HPO). PyKEEN
implements a simple dictionary-based configuration that is interpreted
by :func:`pykeen.hpo.hpo.suggest_kwargs` in the HPO pipeline.

HPO default ranges can be applied to all keyword arguments appearing in the
``__init__()`` function of your model by setting a class-level variable called
``hpo_default``.

For example, the ``embedding_dim`` can be specified as being on a range between
100 and 150 with the following:

.. code-block:: python

    class DistMult(ERModel):
        hpo_default = {
            'embedding_dim': dict(type=int, low=100, high=150)
        }
        ...

A step size can be imposed with ``q``:

.. code-block:: python

    class DistMult(ERModel):
        hpo_default = {
            'embedding_dim': dict(type=int, low=100, high=150 q=5)
        }
        ...

An alternative scale can be imposed with ``scale``. Right now, the
default is linear, and ``scale`` can optionally be set to ``power_two``
for integers as in:

.. code-block:: python

    class DistMult(ERModel):
        hpo_default = {
            # will uniformly give 16, 32, 64, 128 (left inclusive, right exclusive)
            'hidden_dim': dict(type=int, low=4, high=8, scale='power_two')
        }
        ...

.. warning:: Alternative scales can not currently be used in combination with step size (``q``).

There are other possibilities for specifying the ``type`` as ``float``, ``categorical``,
or as ``bool``.

With ``float``, you can't use the ``q`` option nor set the scale to ``power_two``,
but the scale can be set to ``log`` (see :class:`optuna.distributions.LogUniformDistribution`).

.. code-block:: python

    hpo_default = {
        # will uniformly give floats on the range of [1.0, 2.0) (exclusive)
        'alpha': dict(type='float', low=1.0, high=2.0),

        # will uniformly give 1.0, 2.0, or 4.0 (exclusive)
        'beta': dict(type='float', low=1.0, high=8.0, scale='log'),
    }

With ``categorical``, you can form a dictionary like the following using ``type='categorical'``
and giving a ``choices`` entry that contains a sequence of either integers, floats, or strings.

.. code-block:: python

    hpo_default = {
        'similarity': dict(type='categorical', choices=[...])
    }

With ``bool``, you can simply use ``dict(type=bool)`` or ``dict(type='bool')``.

.. note::

    The HPO rules are subject to change as they are tightly coupled to :mod:`optuna`,
    which since version 2.0.0 has introduced several new possibilities.

Implementing a model by instantiating :class:`pykeen.models.ERModel`
--------------------------------------------------------------------
Instead of creating a new class, you can also directly use the :class:`pykeen.models.ERModel`, e.g.

.. code-block:: python

    from pykeen.models import ERModel
    from pykeen.losses import BCEWithLogitsLoss

    model = ERModel(
        triples_factory=...,
        loss="BCEWithLogits",
        interaction="transformer",
        entity_representations_kwargs=dict(embedding_dim=64),
        relation_representations_kwargs=dict(embedding_dim=64),
    )

Using a Custom Model with the Pipeline
--------------------------------------
We can use this new model with all available losses, evaluators,
training pipelines, inverse triple modeling, via the :func:`pykeen.pipeline.pipeline`,
since in addition to the names of models (given as strings), it can also take model
classes in the ``model`` argument.

.. code-block:: python

    from pykeen.pipeline import pipeline

    pipeline(
        model=DistMult,
        dataset='Nations',
        loss='NSSA',
    )