Skip to content

Buffers

DataBuffers store data for the bandits. This way a bandit is not limited to the current batch of data but can look at previous data e.g. during a complete retraining. Currently we only provide a simple in-memory buffer but more sophisticated buffer can be implemented by subclassing the AbstractBanditDataBuffer class.

To achieve different buffer strategies, one can provide a DataRetrievalStrategy to the buffer. The buffer will then use the get_training_indices method to get the indices of the data to use for training.

Currently, we provide two strategies:

  • AllDataRetrievalStrategy: Use all data for training.

  • SlidingWindowRetrievalStrategy: Use a sliding window of the last window_size data points for training.

Custom strategies can be implemented by subclassing the DataRetrievalStrategy class and implementing the get_training_indices method.



Data Buffers

AbstractBanditDataBuffer(retrieval_strategy)

Bases: ABC, Dataset[BufferDataFormat[ActionInputType]], Generic[ActionInputType, StateDictType], Sized

Abstract base class for bandit data buffer management.

A data buffer stores contextualized actions, optional embedded actions (depending on the bandit algorithm), and corresponding rewards. It also implements a strategy for selecting which data points to use during training.

Parameters:

Name Type Description Default
retrieval_strategy DataRetrievalStrategy

Strategy for managing training data selection.

required
Source code in src/calvera/utils/data_storage.py
def __init__(self, retrieval_strategy: DataRetrievalStrategy):
    """Initialize the data buffer.

    Args:
        retrieval_strategy: Strategy for managing training data selection.
    """
    self.retrieval_strategy = retrieval_strategy

add_batch(contextualized_actions, embedded_actions, rewards, chosen_actions) abstractmethod

Add a batch of data points to the buffer.

Parameters:

Name Type Description Default
contextualized_actions ActionInputType

Tensor of contextualized actions. Shape: (buffer_size, n_features) or n_items tuple of tensors of shape (buffer_size, n_features).

required
embedded_actions Tensor | None

Optional tensor of embedded actions. Shape: (buffer_size, n_embedding_size).

required
rewards Tensor

Tensor of rewards received for each action. Shape: (buffer_size,).

required
chosen_actions Tensor | None

The chosen actions one-hot encoded. Should only be provided if there is only a single context (e.g. NeuralLinear). Shape: (batch_size, n_actions).

required
Source code in src/calvera/utils/data_storage.py
@abstractmethod
def add_batch(
    self,
    contextualized_actions: ActionInputType,
    embedded_actions: torch.Tensor | None,
    rewards: torch.Tensor,
    chosen_actions: torch.Tensor | None,
) -> None:
    """Add a batch of data points to the buffer.

    Args:
        contextualized_actions: Tensor of contextualized actions.
            Shape: (buffer_size, n_features) or n_items tuple of tensors of shape (buffer_size, n_features).
        embedded_actions: Optional tensor of embedded actions.
            Shape: (buffer_size, n_embedding_size).
        rewards: Tensor of rewards received for each action.
            Shape: (buffer_size,).
        chosen_actions: The chosen actions one-hot encoded. Should only be provided
            if there is only a single context (e.g. NeuralLinear).
            Shape: (batch_size, n_actions).
    """
    pass

get_batch(batch_size) abstractmethod

Get batches of training data according to retrieval strategy.

Parameters:

Name Type Description Default
batch_size int

Size of the batch to return.

required

Returns:

Name Type Description
ActionInputType

Tuple of (contextualized_actions, embedded_actions, rewards, chosen_actions) for the batch.

contextualized_actions Tensor | None

ActionInputType - Either a tensor of shape (batch_size, n_features) or a tuple of tensors.

embedded_actions Tensor

Optional tensor of shape (batch_size, n_embedding_size), or None if not used.

rewards Tensor | None

Tensor of shape (batch_size,).

chosen_actions tuple[ActionInputType, Tensor | None, Tensor, Tensor | None]

Optional tensor of one-hot encoded chosen actions. Shape: (batch_size, n_actions).

Raises:

Type Description
ValueError

If requested batch_size is larger than available data.

Source code in src/calvera/utils/data_storage.py
@abstractmethod
def get_batch(
    self,
    batch_size: int,
) -> tuple[ActionInputType, torch.Tensor | None, torch.Tensor, torch.Tensor | None]:
    """Get batches of training data according to retrieval strategy.

    Args:
        batch_size: Size of the batch to return.

    Returns:
        Tuple of (contextualized_actions, embedded_actions, rewards, chosen_actions) for the batch.
        contextualized_actions: ActionInputType - Either a tensor of shape (batch_size, n_features)
            or a tuple of tensors.
        embedded_actions: Optional tensor of shape (batch_size, n_embedding_size), or None if not used.
        rewards: Tensor of shape (batch_size,).
        chosen_actions: Optional tensor of one-hot encoded chosen actions. Shape: (batch_size, n_actions).

    Raises:
        ValueError: If requested batch_size is larger than available data.
    """
    pass

update_embeddings(embedded_actions) abstractmethod

Update the embedded actions in the buffer.

Parameters:

Name Type Description Default
embedded_actions Tensor

New embeddings for all contexts in buffer. Shape: (buffer_size, n_embedding_size).

required
Source code in src/calvera/utils/data_storage.py
@abstractmethod
def update_embeddings(self, embedded_actions: torch.Tensor) -> None:
    """Update the embedded actions in the buffer.

    Args:
        embedded_actions: New embeddings for all contexts in buffer.
            Shape: (buffer_size, n_embedding_size).
    """
    pass

state_dict() abstractmethod

Get state dictionary for checkpointing.

Returns:

Type Description
StateDictType

Dictionary containing all necessary state information for restoring the buffer.

Source code in src/calvera/utils/data_storage.py
@abstractmethod
def state_dict(
    self,
) -> StateDictType:
    """Get state dictionary for checkpointing.

    Returns:
        Dictionary containing all necessary state information for restoring the buffer.
    """
    pass

load_state_dict(state_dict) abstractmethod

Load state from checkpoint dictionary.

Parameters:

Name Type Description Default
state_dict StateDictType

Dictionary containing state information for restoring the buffer.

required
Source code in src/calvera/utils/data_storage.py
@abstractmethod
def load_state_dict(
    self,
    state_dict: StateDictType,
) -> None:
    """Load state from checkpoint dictionary.

    Args:
        state_dict: Dictionary containing state information for restoring the buffer.
    """
    pass

TensorDataBuffer(retrieval_strategy, max_size=None, device=None)

Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]

In-memory implementation of bandit data buffer.

Known limitations:

  • It can't handle a varying amount of actions over time.

Parameters:

Name Type Description Default
retrieval_strategy DataRetrievalStrategy

Strategy for managing training data selection.

required
max_size int | None

Optional maximum number of samples to store. None means unlimited.

None
device device | None

Device to store data on (default: CPU).

None
Source code in src/calvera/utils/data_storage.py
def __init__(
    self,
    retrieval_strategy: DataRetrievalStrategy,
    max_size: int | None = None,
    device: torch.device | None = None,
):
    """Initialize the in-memory buffer.

    Args:
        retrieval_strategy: Strategy for managing training data selection.
        max_size: Optional maximum number of samples to store. None means unlimited.
        device: Device to store data on (default: CPU).
    """
    super().__init__(retrieval_strategy)

    self.max_size = max_size
    self.device = device if device is not None else torch.device("cpu")

    self.contextualized_actions: None | torch.Tensor = None
    self.embedded_actions = torch.empty(0, 0, device=device)  # shape: (n, n_embedding_size)
    self.rewards = torch.empty(0, device=device)  # shape: (n,)
    self.chosen_actions = torch.empty(0, 0, device=device)  # shape: (n, n_actions)

ListDataBuffer(retrieval_strategy, max_size=None)

Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]

A list-based implementation of the bandit data buffer.

This implementation stores contextualized actions, optional embedded actions, rewards and chosen_actions in Python lists. torch.Tensors are not concatenated but stored as lists. Stores the torch.Tensors without modifying their location (device).

Parameters:

Name Type Description Default
retrieval_strategy DataRetrievalStrategy

Strategy for selecting training samples.

required
max_size int | None

Optional maximum number of samples to store.

None
Source code in src/calvera/utils/data_storage.py
def __init__(self, retrieval_strategy: DataRetrievalStrategy, max_size: int | None = None):
    """Initialize the list-based buffer.

    Args:
        retrieval_strategy: Strategy for selecting training samples.
        max_size: Optional maximum number of samples to store.
    """
    super().__init__(retrieval_strategy)
    self.max_size = max_size
    self.contextualized_actions: list[ActionInputType] = []
    self.embedded_actions: list[torch.Tensor] = []  # Can store embeddings if provided
    self.rewards: list[float] = []
    self.chosen_actions: list[torch.Tensor] = []



Strategies

DataRetrievalStrategy

Bases: Protocol

Protocol defining how training data should be managed in the buffer.

This protocol represents a strategy for determining which data points from the buffer should be used during training. Different implementations can select data in various ways (e.g., all data, most recent data, etc.).

get_training_indices(total_samples)

Get indices of data points to use for training.

For the TensorDataBuffer this has to be deterministic.

Parameters:

Name Type Description Default
total_samples int

Total number of samples in the buffer.

required

Returns:

Name Type Description
Tensor

Tensor of indices to use for training.

Shape Tensor

(n_selected_samples,).

Source code in src/calvera/utils/data_storage.py
def get_training_indices(self, total_samples: int) -> torch.Tensor:
    """Get indices of data points to use for training.

    For the `TensorDataBuffer` this has to be deterministic.

    Args:
        total_samples: Total number of samples in the buffer.

    Returns:
        Tensor of indices to use for training.
        Shape: (n_selected_samples,).
    """
    ...

SlidingWindowRetrievalStrategy(window_size)

Bases: DataRetrievalStrategy

Strategy that uses only the last n data points from the buffer for training.

Parameters:

Name Type Description Default
window_size int

Number of most recent samples to use for training.

required
Source code in src/calvera/utils/data_storage.py
def __init__(self, window_size: int):
    """Initialize the sliding window strategy.

    Args:
        window_size: Number of most recent samples to use for training.
    """
    self.window_size = window_size

AllDataRetrievalStrategy

Bases: DataRetrievalStrategy

Strategy that uses all available data points in the buffer for training.