Buffers

DataBuffers store data for the bandits. This way a bandit is not limited to the current batch of data but can look at previous data e.g. during a complete retraining. Currently we only provide a simple in-memory buffer but more sophisticated buffer can be implemented by subclassing the AbstractBanditDataBuffer class.

To achieve different buffer strategies, one can provide a DataRetrievalStrategy to the buffer. The buffer will then use the get_training_indices method to get the indices of the data to use for training.

Currently, we provide two strategies:

AllDataRetrievalStrategy: Use all data for training.
SlidingWindowRetrievalStrategy: Use a sliding window of the last window_size data points for training.

Custom strategies can be implemented by subclassing the DataRetrievalStrategy class and implementing the get_training_indices method.

Data Buffers

`AbstractBanditDataBuffer(retrieval_strategy)`

Bases: ABC, Dataset[BufferDataFormat[ActionInputType]], Generic[ActionInputType, StateDictType], Sized

Abstract base class for bandit data buffer management.

A data buffer stores contextualized actions, optional embedded actions (depending on the bandit algorithm), and corresponding rewards. It also implements a strategy for selecting which data points to use during training.

Parameters:

Name	Type	Description	Default
`retrieval_strategy`	`DataRetrievalStrategy`	Strategy for managing training data selection.	required

Source code in src/calvera/utils/data_storage.py

def __init__(self, retrieval_strategy: DataRetrievalStrategy):
    """Initialize the data buffer.

    Args:
        retrieval_strategy: Strategy for managing training data selection.
    """
    self.retrieval_strategy = retrieval_strategy

`add_batch(contextualized_actions, embedded_actions, rewards, chosen_actions)` `abstractmethod`

Add a batch of data points to the buffer.

Parameters:

Name	Type	Description	Default
`contextualized_actions`	`ActionInputType`	Tensor of contextualized actions. Shape: (buffer_size, n_features) or n_items tuple of tensors of shape (buffer_size, n_features).	required
`embedded_actions`	`Tensor \| None`	Optional tensor of embedded actions. Shape: (buffer_size, n_embedding_size).	required
`rewards`	`Tensor`	Tensor of rewards received for each action. Shape: (buffer_size,).	required
`chosen_actions`	`Tensor \| None`	The chosen actions one-hot encoded. Should only be provided if there is only a single context (e.g. NeuralLinear). Shape: (batch_size, n_actions).	required

Source code in src/calvera/utils/data_storage.py

@abstractmethod
def add_batch(
    self,
    contextualized_actions: ActionInputType,
    embedded_actions: torch.Tensor | None,
    rewards: torch.Tensor,
    chosen_actions: torch.Tensor | None,
) -> None:
    """Add a batch of data points to the buffer.

    Args:
        contextualized_actions: Tensor of contextualized actions.
            Shape: (buffer_size, n_features) or n_items tuple of tensors of shape (buffer_size, n_features).
        embedded_actions: Optional tensor of embedded actions.
            Shape: (buffer_size, n_embedding_size).
        rewards: Tensor of rewards received for each action.
            Shape: (buffer_size,).
        chosen_actions: The chosen actions one-hot encoded. Should only be provided
            if there is only a single context (e.g. NeuralLinear).
            Shape: (batch_size, n_actions).
    """
    pass

`get_batch(batch_size)` `abstractmethod`

Get batches of training data according to retrieval strategy.

Parameters:

Name	Type	Description	Default
`batch_size`	`int`	Size of the batch to return.	required

Returns:

Name	Type	Description
	`ActionInputType`	Tuple of (contextualized_actions, embedded_actions, rewards, chosen_actions) for the batch.
`contextualized_actions`	`Tensor \| None`	ActionInputType - Either a tensor of shape (batch_size, n_features) or a tuple of tensors.
`embedded_actions`	`Tensor`	Optional tensor of shape (batch_size, n_embedding_size), or None if not used.
`rewards`	`Tensor \| None`	Tensor of shape (batch_size,).
`chosen_actions`	`tuple[ActionInputType, Tensor \| None, Tensor, Tensor \| None]`	Optional tensor of one-hot encoded chosen actions. Shape: (batch_size, n_actions).

Raises:

Type	Description
`ValueError`	If requested batch_size is larger than available data.

Source code in src/calvera/utils/data_storage.py

@abstractmethod
def get_batch(
    self,
    batch_size: int,
) -> tuple[ActionInputType, torch.Tensor | None, torch.Tensor, torch.Tensor | None]:
    """Get batches of training data according to retrieval strategy.

    Args:
        batch_size: Size of the batch to return.

    Returns:
        Tuple of (contextualized_actions, embedded_actions, rewards, chosen_actions) for the batch.
        contextualized_actions: ActionInputType - Either a tensor of shape (batch_size, n_features)
            or a tuple of tensors.
        embedded_actions: Optional tensor of shape (batch_size, n_embedding_size), or None if not used.
        rewards: Tensor of shape (batch_size,).
        chosen_actions: Optional tensor of one-hot encoded chosen actions. Shape: (batch_size, n_actions).

    Raises:
        ValueError: If requested batch_size is larger than available data.
    """
    pass

`update_embeddings(embedded_actions)` `abstractmethod`

Update the embedded actions in the buffer.

Parameters:

Name	Type	Description	Default
`embedded_actions`	`Tensor`	New embeddings for all contexts in buffer. Shape: (buffer_size, n_embedding_size).	required

Source code in src/calvera/utils/data_storage.py

@abstractmethod
def update_embeddings(self, embedded_actions: torch.Tensor) -> None:
    """Update the embedded actions in the buffer.

    Args:
        embedded_actions: New embeddings for all contexts in buffer.
            Shape: (buffer_size, n_embedding_size).
    """
    pass

`state_dict()` `abstractmethod`

Get state dictionary for checkpointing.

Returns:

Type	Description
`StateDictType`	Dictionary containing all necessary state information for restoring the buffer.

Source code in src/calvera/utils/data_storage.py

@abstractmethod
def state_dict(
    self,
) -> StateDictType:
    """Get state dictionary for checkpointing.

    Returns:
        Dictionary containing all necessary state information for restoring the buffer.
    """
    pass

`load_state_dict(state_dict)` `abstractmethod`

Load state from checkpoint dictionary.

Parameters:

Name	Type	Description	Default
`state_dict`	`StateDictType`	Dictionary containing state information for restoring the buffer.	required

Source code in src/calvera/utils/data_storage.py

@abstractmethod
def load_state_dict(
    self,
    state_dict: StateDictType,
) -> None:
    """Load state from checkpoint dictionary.

    Args:
        state_dict: Dictionary containing state information for restoring the buffer.
    """
    pass

`TensorDataBuffer(retrieval_strategy, max_size=None, device=None)`

Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]

In-memory implementation of bandit data buffer.

Known limitations:

It can't handle a varying amount of actions over time.

Parameters:

Name	Type	Description	Default
`retrieval_strategy`	`DataRetrievalStrategy`	Strategy for managing training data selection.	required
`max_size`	`int \| None`	Optional maximum number of samples to store. None means unlimited.	`None`
`device`	`device \| None`	Device to store data on (default: CPU).	`None`

Source code in src/calvera/utils/data_storage.py

def __init__(
    self,
    retrieval_strategy: DataRetrievalStrategy,
    max_size: int | None = None,
    device: torch.device | None = None,
):
    """Initialize the in-memory buffer.

    Args:
        retrieval_strategy: Strategy for managing training data selection.
        max_size: Optional maximum number of samples to store. None means unlimited.
        device: Device to store data on (default: CPU).
    """
    super().__init__(retrieval_strategy)

    self.max_size = max_size
    self.device = device if device is not None else torch.device("cpu")

    self.contextualized_actions: None | torch.Tensor = None
    self.embedded_actions = torch.empty(0, 0, device=device)  # shape: (n, n_embedding_size)
    self.rewards = torch.empty(0, device=device)  # shape: (n,)
    self.chosen_actions = torch.empty(0, 0, device=device)  # shape: (n, n_actions)

`ListDataBuffer(retrieval_strategy, max_size=None)`

Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]

A list-based implementation of the bandit data buffer.

This implementation stores contextualized actions, optional embedded actions, rewards and chosen_actions in Python lists. torch.Tensors are not concatenated but stored as lists. Stores the torch.Tensors without modifying their location (device).

Parameters:

Name	Type	Description	Default
`retrieval_strategy`	`DataRetrievalStrategy`	Strategy for selecting training samples.	required
`max_size`	`int \| None`	Optional maximum number of samples to store.	`None`

Source code in src/calvera/utils/data_storage.py

def __init__(self, retrieval_strategy: DataRetrievalStrategy, max_size: int | None = None):
    """Initialize the list-based buffer.

    Args:
        retrieval_strategy: Strategy for selecting training samples.
        max_size: Optional maximum number of samples to store.
    """
    super().__init__(retrieval_strategy)
    self.max_size = max_size
    self.contextualized_actions: list[ActionInputType] = []
    self.embedded_actions: list[torch.Tensor] = []  # Can store embeddings if provided
    self.rewards: list[float] = []
    self.chosen_actions: list[torch.Tensor] = []

Strategies

`DataRetrievalStrategy`

Bases: Protocol

Protocol defining how training data should be managed in the buffer.

This protocol represents a strategy for determining which data points from the buffer should be used during training. Different implementations can select data in various ways (e.g., all data, most recent data, etc.).

`get_training_indices(total_samples)`

Get indices of data points to use for training.

For the TensorDataBuffer this has to be deterministic.

Parameters:

Name	Type	Description	Default
`total_samples`	`int`	Total number of samples in the buffer.	required

Returns:

Name	Type	Description
	`Tensor`	Tensor of indices to use for training.
`Shape`	`Tensor`	(n_selected_samples,).

Source code in src/calvera/utils/data_storage.py

def get_training_indices(self, total_samples: int) -> torch.Tensor:
    """Get indices of data points to use for training.

    For the `TensorDataBuffer` this has to be deterministic.

    Args:
        total_samples: Total number of samples in the buffer.

    Returns:
        Tensor of indices to use for training.
        Shape: (n_selected_samples,).
    """
    ...

`SlidingWindowRetrievalStrategy(window_size)`

Bases: DataRetrievalStrategy

Strategy that uses only the last n data points from the buffer for training.

Parameters:

Name	Type	Description	Default
`window_size`	`int`	Number of most recent samples to use for training.	required

Source code in src/calvera/utils/data_storage.py

def __init__(self, window_size: int):
    """Initialize the sliding window strategy.

    Args:
        window_size: Number of most recent samples to use for training.
    """
    self.window_size = window_size

`AllDataRetrievalStrategy`

Bases: DataRetrievalStrategy

Strategy that uses all available data points in the buffer for training.

Buffers

Data Buffers

AbstractBanditDataBuffer(retrieval_strategy)

add_batch(contextualized_actions, embedded_actions, rewards, chosen_actions) abstractmethod

get_batch(batch_size) abstractmethod

update_embeddings(embedded_actions) abstractmethod

state_dict() abstractmethod

load_state_dict(state_dict) abstractmethod

TensorDataBuffer(retrieval_strategy, max_size=None, device=None)

ListDataBuffer(retrieval_strategy, max_size=None)

Strategies

DataRetrievalStrategy

get_training_indices(total_samples)

SlidingWindowRetrievalStrategy(window_size)

AllDataRetrievalStrategy

`AbstractBanditDataBuffer(retrieval_strategy)`

`add_batch(contextualized_actions, embedded_actions, rewards, chosen_actions)` `abstractmethod`

`get_batch(batch_size)` `abstractmethod`

`update_embeddings(embedded_actions)` `abstractmethod`

`state_dict()` `abstractmethod`

`load_state_dict(state_dict)` `abstractmethod`

`TensorDataBuffer(retrieval_strategy, max_size=None, device=None)`

`ListDataBuffer(retrieval_strategy, max_size=None)`

`DataRetrievalStrategy`

`get_training_indices(total_samples)`

`SlidingWindowRetrievalStrategy(window_size)`

`AllDataRetrievalStrategy`