Bandit Interface

Below is the interface that all bandit algorithms share, defined in the AbstractBandit class. The idea is that assertions happen in the forward() method for the input and in the training_step() method for the update using the provided rewards and chosen contextualized actions. The outwards facing methods are forward() and training_step(). forward() is used for inference and training_step() is used for training. So, when implementing a new bandit, the following methods need to be implemented:

_predict_action(self, x: torch.Tensor) -> torch.Tensor: Predicts the action for the given context.
_update(self, x: torch.Tensor, y: torch.Tensor) -> None: Updates the bandit with the given context and reward.

`AbstractBandit(n_features, buffer=None, train_batch_size=32, selector=None)`

Bases: ABC, LightningModule, Generic[ActionInputType]

Defines the interface for all Bandit algorithms by implementing pytorch Lightning Module methods.

Parameters:

Name	Type	Description	Default
`n_features`	`int`	The number of features in the contextualized actions.	required
`buffer`	`AbstractBanditDataBuffer[ActionInputType, Any] \| None`	The buffer used for storing the data for continuously updating the neural network.	`None`
`train_batch_size`	`int`	The mini-batch size used for the train loop (started by `trainer.fit()`).	`32`
`selector`	`AbstractSelector \| None`	The selector used to choose the best action. Default is ArgMaxSelector (if None).	`None`

Source code in src/calvera/bandits/abstract_bandit.py

def __init__(
    self,
    n_features: int,
    buffer: AbstractBanditDataBuffer[ActionInputType, Any] | None = None,
    train_batch_size: int = 32,
    selector: AbstractSelector | None = None,
):
    """Initializes the Bandit.

    Args:
        n_features: The number of features in the contextualized actions.
        buffer: The buffer used for storing the data for continuously updating the neural network.
        train_batch_size: The mini-batch size used for the train loop (started by `trainer.fit()`).
        selector: The selector used to choose the best action. Default is ArgMaxSelector (if None).
    """
    assert n_features > 0, "The number of features must be greater than 0."
    assert train_batch_size > 0, "The batch_size for training must be greater than 0."

    super().__init__()

    if buffer is None:
        self.buffer = TensorDataBuffer(
            retrieval_strategy=AllDataRetrievalStrategy(),
            max_size=None,
            device=self.device,
        )
    else:
        self.buffer = buffer

    self.selector = selector if selector is not None else ArgMaxSelector()

    self.save_hyperparameters(
        {
            "n_features": n_features,
            "train_batch_size": train_batch_size,
        }
    )

`forward(*args, **kwargs)`

Forward pass.

Given the contextualized actions, selects a single best action, or a set of actions in the case of combinatorial bandits. This can be computed for many samples in one batch.

Parameters:

Name	Type	Description	Default
`contextualized_actions`		Tensor of shape (batch_size, n_actions, n_features).	required
`*args`	`Any`	Additional arguments. Passed to the `_predict_action` method	`()`
`**kwargs`	`Any`	Additional keyword arguments. Passed to the `_predict_action` method.	`{}`

Returns:

Name	Type	Description
`chosen_actions`	`Tensor`	One-hot encoding of which actions were chosen. Shape: (batch_size, n_actions).
`p`	`Tensor`	The probability of the chosen actions. In the combinatorial case, this will be a super set of actions. Non-probabilistic algorithms should always return 1. Shape: (batch_size, ).

Source code in src/calvera/bandits/abstract_bandit.py

def forward(
    self,
    *args: Any,
    **kwargs: Any,
) -> tuple[torch.Tensor, torch.Tensor]:
    """Forward pass.

    Given the contextualized actions, selects a single best action, or a set of actions in the case of combinatorial
    bandits. This can be computed for many samples in one batch.

    Args:
        contextualized_actions: Tensor of shape (batch_size, n_actions, n_features).
        *args: Additional arguments. Passed to the `_predict_action` method
        **kwargs: Additional keyword arguments. Passed to the `_predict_action` method.

    Returns:
        chosen_actions: One-hot encoding of which actions were chosen.
            Shape: (batch_size, n_actions).
        p: The probability of the chosen actions. In the combinatorial case,
            this will be a super set of actions. Non-probabilistic algorithms should always return 1.
            Shape: (batch_size, ).
    """
    contextualized_actions = kwargs.get(
        "contextualized_actions", args[0]
    )  # shape: (batch_size, n_actions, n_features)
    assert contextualized_actions is not None, "contextualized_actions must be passed."

    if isinstance(contextualized_actions, torch.Tensor):
        assert contextualized_actions.ndim >= 3, (
            "Chosen actions must have shape (batch_size, num_actions, ...) "
            f"but got shape {contextualized_actions.shape}"
        )
        batch_size = contextualized_actions.shape[0]
    elif isinstance(contextualized_actions, tuple | list):
        assert len(contextualized_actions) > 1, "Tuple must contain at least 2 tensors"
        assert contextualized_actions[0].ndim >= 3, (
            "Chosen actions must have shape (batch_size, num_actions, ...) "
            f"but got shape {contextualized_actions[0].shape}"
        )
        batch_size = contextualized_actions[0].shape[0]
        assert all(
            action_item.ndim >= 3 for action_item in contextualized_actions
        ), "All tensors in tuple must have shape (batch_size, num_actions, ...)"
    else:
        raise ValueError(
            f"Contextualized actions must be a torch.Tensor or a tuple of torch.Tensors."
            f"Received {type(contextualized_actions)}."
        )

    result, p = self._predict_action(*args, **kwargs)

    # assert result.shape[0] == batch_size (
    #     f"Batch size mismatch. Expected shape {batch_size} but got {result.shape[0]}"
    # )

    assert (
        p.ndim == 1 and p.shape[0] == batch_size and torch.all(p >= 0) and torch.all(p <= 1)
    ), f"The probabilities must be between 0 and 1 and have shape {batch_size} but got shape {p.shape}"

    return result, p

`training_step(batch, batch_idx)`

Perform a single update step.

See the documentation for the LightningModule's training_step method. Acts as a wrapper for the _update method in case we want to change something for every bandit or use the update independently from lightning, e.g. in tests.

Parameters:

Name	Type	Description	Default
`batch`	`BufferDataFormat[ActionInputType]`	The output of your data iterable, usually a DataLoader. It may contain 2 or 3 elements: contextualized_actions: shape (batch_size, n_chosen_actions, n_features). [Optional: embedded_actions: shape (batch_size, n_chosen_actions, n_features).] realized_rewards: shape (batch_size, n_chosen_actions). The embedded_actions are only passed and required for certain bandits like the NeuralLinearBandit.	required
`batch_idx`	`int`	The index of this batch. Note that if a separate DataLoader is used for each step, this will be reset for each new data loader.	required
`data_loader_idx`		The index of the data loader. This is useful if you have multiple data loaders at once and want to do something different for each one.	required
`*args`		Additional arguments. Passed to the `_update` method.	required
`**kwargs`		Additional keyword arguments. Passed to the `_update` method.	required

Returns:

Type	Description
`Tensor`	The loss value. In most cases, it makes sense to return the negative reward. Shape: (1,). Since we do not use the lightning optimizer, this value is only relevant for logging/visualization of the training process.

Source code in src/calvera/bandits/abstract_bandit.py

def training_step(self, batch: BufferDataFormat[ActionInputType], batch_idx: int) -> torch.Tensor:
    """Perform a single update step.

    See the documentation for the LightningModule's `training_step` method.
    Acts as a wrapper for the `_update` method in case we want to change something for every bandit or use the
    update independently from lightning, e.g. in tests.

    Args:
        batch: The output of your data iterable, usually a DataLoader. It may contain 2 or 3 elements:
            contextualized_actions: shape (batch_size, n_chosen_actions, n_features).
            [Optional: embedded_actions: shape (batch_size, n_chosen_actions, n_features).]
            realized_rewards: shape (batch_size, n_chosen_actions).
            The embedded_actions are only passed and required for certain bandits like the NeuralLinearBandit.
        batch_idx: The index of this batch. Note that if a separate DataLoader is used for each step,
            this will be reset for each new data loader.
        data_loader_idx: The index of the data loader. This is useful if you have multiple data loaders
            at once and want to do something different for each one.
        *args: Additional arguments. Passed to the `_update` method.
        **kwargs: Additional keyword arguments. Passed to the `_update` method.

    Returns:
        The loss value. In most cases, it makes sense to return the negative reward.
            Shape: (1,). Since we do not use the lightning optimizer, this value is only relevant
            for logging/visualization of the training process.
    """
    assert len(batch) == 4, (
        "Batch must contain four tensors: (contextualized_actions, embedded_actions, rewards, chosen_actions)."
        "`embedded_actions` and `chosen_actions` can be None."
    )

    realized_rewards: torch.Tensor = batch[2]  # shape: (batch_size, n_chosen_arms)

    assert realized_rewards.ndim == 2, "Rewards must have shape (batch_size, n_chosen_arms)"
    assert realized_rewards.device == self.device, "Realized reward must be on the same device as the model."

    batch_size, n_chosen_arms = realized_rewards.shape

    (
        contextualized_actions,
        embedded_actions,
    ) = batch[:2]

    if self._custom_data_loader_passed:
        self.record_feedback(contextualized_actions, realized_rewards)

    if isinstance(contextualized_actions, torch.Tensor):
        assert (
            contextualized_actions.device == self.device
        ), "Contextualized actions must be on the same device as the model."

        assert contextualized_actions.ndim >= 3, (
            f"Chosen actions must have shape (batch_size, n_chosen_arms, ...) "
            f"but got shape {contextualized_actions.shape}"
        )
        assert contextualized_actions.shape[0] == batch_size and contextualized_actions.shape[1] == n_chosen_arms, (
            "Chosen contextualized actions must have shape (batch_size, n_chosen_arms, ...) "
            f"same as reward. Expected shape ({(batch_size, n_chosen_arms)}, ...) "
            f"but got shape {contextualized_actions.shape}"
        )
    elif isinstance(contextualized_actions, tuple | list):
        assert all(
            action.device == self.device for action in contextualized_actions
        ), "Contextualized actions must be on the same device as the model."

        assert len(contextualized_actions) > 1 and contextualized_actions[0].ndim >= 3, (
            "The tuple of contextualized_actions must contain more than one element and be of shape "
            "(batch_size, n_chosen_arms, ...)."
        )
        assert (
            contextualized_actions[0].shape[0] == batch_size and contextualized_actions[0].shape[1] == n_chosen_arms
        ), (
            "Chosen contextualized actions must have shape (batch_size, n_chosen_arms, ...) "
            f"same as reward. Expected shape ({(batch_size, n_chosen_arms)}, ...) "
            f"but got shape {contextualized_actions[0].shape}"
        )
    else:
        raise ValueError(
            f"Contextualized actions must be a torch.Tensor or a tuple of torch.Tensors. "
            f"Received {type(contextualized_actions)}."
        )

    if embedded_actions is not None:
        assert embedded_actions.device == self.device, "Embedded actions must be on the same device as the model."
        assert (
            embedded_actions.ndim == 3
        ), "Embedded actions must have shape (batch_size, n_chosen_arms, n_features)"
        assert embedded_actions.shape[0] == batch_size and embedded_actions.shape[1] == n_chosen_arms, (
            "Chosen embedded actions must have shape (batch_size, n_chosen_arms, n_features) "
            f"same as reward. Expected shape ({(batch_size, n_chosen_arms)}, n_features) "
            f"but got shape {embedded_actions[0].shape}"
        )

    loss = self._update(
        batch,
        batch_idx,
    )

    assert loss.ndim == 0, "Loss must be a scalar value."

    return loss

`_predict_action(contextualized_actions, **kwargs)` `abstractmethod`

Forward pass, computed batch-wise.

Given the contextualized actions, selects a single best action, or a set of actions in the case of combinatorial bandits. Next to the action(s), the selector also returns the probability of chosing this action. This will allow for logging and Batch Learning from Logged Bandit Feedback (BLBF). Deterministic algorithms like UCB will always return 1.

Parameters:

Name	Type	Description	Default
`contextualized_actions`	`ActionInputType`	Input into bandit or network containing all actions. Either Tensor of shape (batch_size, n_actions, n_features) or a tuple of tensors of shape (batch_size, n_actions, n_features) if there are several inputs to the model.	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Name	Type	Description
`chosen_actions`	`Tensor`	One-hot encoding of which actions were chosen. Shape: (batch_size, n_actions).
`p`	`Tensor`	The probability of the chosen actions. In the combinatorial case, this will be one probability for the super set of actions. Deterministic algorithms (like UCB) should always return 1. Shape: (batch_size, ).

Source code in src/calvera/bandits/abstract_bandit.py

@abstractmethod
def _predict_action(
    self,
    contextualized_actions: ActionInputType,
    **kwargs: Any,
) -> tuple[torch.Tensor, torch.Tensor]:
    """Forward pass, computed batch-wise.

    Given the contextualized actions, selects a single best action, or a set of actions in the case of combinatorial
    bandits. Next to the action(s), the selector also returns the probability of chosing this action. This will
    allow for logging and Batch Learning from Logged Bandit Feedback (BLBF). Deterministic algorithms like UCB will
    always return 1.

    Args:
        contextualized_actions: Input into bandit or network containing all actions. Either Tensor of shape
            (batch_size, n_actions, n_features) or a tuple of tensors of shape (batch_size, n_actions, n_features)
            if there are several inputs to the model.
        **kwargs: Additional keyword arguments.

    Returns:
        chosen_actions: One-hot encoding of which actions were chosen.
            Shape: (batch_size, n_actions).
        p: The probability of the chosen actions. In the combinatorial case,
            this will be one probability for the super set of actions. Deterministic algorithms (like UCB) should
            always return 1. Shape: (batch_size, ).
    """
    pass

`_update(*args, **kwargs)` `abstractmethod`

Abstract method to perform a single update step. Should be implemented by the concrete bandit classes.

Parameters:

Name	Type	Description	Default
`batch`		The output of your data iterable, usually a DataLoader. It contains 4 elements: contextualized_actions: shape (batch_size, n_chosen_actions, n_features). [Optional: embedded_actions: shape (batch_size, n_chosen_actions, n_features).] embedded_actions: only passed and required for certain bandits like the NeuralLinearBandit. realized_rewards: shape (batch_size, n_chosen_actions). chosen_actions: only passed and required for certain bandits like the NeuralLinearBandit.	required
`batch_idx`		The index of this batch. Note that if a separate DataLoader is used for each step, this will be reset for each new data loader.	required
`data_loader_idx`		The index of the data loader. This is useful if you have multiple data loaders at once and want to do something different for each one.	required
`*args`	`Any`	Additional arguments.	`()`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`Tensor`	The loss value. In most cases, it makes sense to return the negative reward. Shape: (1,). If we do not use the lightning optimizer, this value is only relevant for logging/visualization of the training process.

Source code in src/calvera/bandits/abstract_bandit.py

@abstractmethod
def _update(
    self,
    *args: Any,
    **kwargs: Any,
) -> torch.Tensor:
    """Abstract method to perform a single update step. Should be implemented by the concrete bandit classes.

    Args:
        batch: The output of your data iterable, usually a DataLoader. It contains 4 elements:
            contextualized_actions: shape (batch_size, n_chosen_actions, n_features).
            [Optional: embedded_actions: shape (batch_size, n_chosen_actions, n_features).]
            embedded_actions: only passed and required for certain bandits like the NeuralLinearBandit.
            realized_rewards: shape (batch_size, n_chosen_actions).
            chosen_actions: only passed and required for certain bandits like the NeuralLinearBandit.
        batch_idx: The index of this batch. Note that if a separate DataLoader is used for each step,
            this will be reset for each new data loader.
        data_loader_idx: The index of the data loader. This is useful if you have multiple data loaders
            at once and want to do something different for each one.
        *args: Additional arguments.
        **kwargs: Additional keyword arguments.

    Returns:
        The loss value. In most cases, it makes sense to return the negative reward.
            Shape: (1,). If we do not use the lightning optimizer, this value is only relevant
            for logging/visualization of the training process.
    """
    pass

`DummyBandit(n_features, k=1)`

Bases: AbstractBandit[ActionInputType]

A dummy bandit that always selects random actions.

Parameters:

Name	Type	Description	Default
`n_features`	`int`	The number of features in the bandit model. Must be positive.	required
`k`	`int`	Number of actions to select. Must be positive. Default is 1.	`1`

Source code in src/calvera/bandits/abstract_bandit.py

def __init__(self, n_features: int, k: int = 1) -> None:
    """Initializes a DummyBandit with a RandomSelector.

    Args:
        n_features: The number of features in the bandit model. Must be positive.
        k: Number of actions to select. Must be positive. Default is 1.
    """
    super().__init__(
        selector=RandomSelector(k=k),
        n_features=n_features,
    )
    self.automatic_optimization = False
    # Please don't ask. Lightning requires any parameter to be registered in order to train it on cuda.
    self.register_parameter("_", None)

Bandit Interface

AbstractBandit(n_features, buffer=None, train_batch_size=32, selector=None)

forward(*args, **kwargs)

training_step(batch, batch_idx)

_predict_action(contextualized_actions, **kwargs) abstractmethod

_update(*args, **kwargs) abstractmethod

DummyBandit(n_features, k=1)

`AbstractBandit(n_features, buffer=None, train_batch_size=32, selector=None)`

`forward(*args, **kwargs)`

`training_step(batch, batch_idx)`

`_predict_action(contextualized_actions, **kwargs)` `abstractmethod`

`_update(*args, **kwargs)` `abstractmethod`

`DummyBandit(n_features, k=1)`