Buffers
DataBuffers store data for the bandits. This way a bandit is not limited to the current batch of data but can
look at previous data e.g. during a complete retraining.
Currently we only provide a simple in-memory buffer but more sophisticated buffer can be implemented by
subclassing the AbstractBanditDataBuffer
class.
To achieve different buffer strategies, one can provide a DataRetrievalStrategy
to the buffer.
The buffer will then use the get_training_indices
method to get the indices of the data to use for training.
Currently, we provide two strategies:
-
AllDataRetrievalStrategy
: Use all data for training. -
SlidingWindowRetrievalStrategy
: Use a sliding window of the lastwindow_size
data points for training.
Custom strategies can be implemented by subclassing the DataRetrievalStrategy
class and implementing the get_training_indices
method.
Data Buffers
AbstractBanditDataBuffer(retrieval_strategy)
Bases: ABC
, Dataset[BufferDataFormat[ActionInputType]]
, Generic[ActionInputType, StateDictType]
, Sized
Abstract base class for bandit data buffer management.
A data buffer stores contextualized actions, optional embedded actions (depending on the bandit algorithm), and corresponding rewards. It also implements a strategy for selecting which data points to use during training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieval_strategy
|
DataRetrievalStrategy
|
Strategy for managing training data selection. |
required |
Source code in src/calvera/utils/data_storage.py
add_batch(contextualized_actions, embedded_actions, rewards, chosen_actions)
abstractmethod
Add a batch of data points to the buffer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
contextualized_actions
|
ActionInputType
|
Tensor of contextualized actions. Shape: (buffer_size, n_features) or n_items tuple of tensors of shape (buffer_size, n_features). |
required |
embedded_actions
|
Tensor | None
|
Optional tensor of embedded actions. Shape: (buffer_size, n_embedding_size). |
required |
rewards
|
Tensor
|
Tensor of rewards received for each action. Shape: (buffer_size,). |
required |
chosen_actions
|
Tensor | None
|
The chosen actions one-hot encoded. Should only be provided if there is only a single context (e.g. NeuralLinear). Shape: (batch_size, n_actions). |
required |
Source code in src/calvera/utils/data_storage.py
get_batch(batch_size)
abstractmethod
Get batches of training data according to retrieval strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
Size of the batch to return. |
required |
Returns:
Name | Type | Description |
---|---|---|
ActionInputType
|
Tuple of (contextualized_actions, embedded_actions, rewards, chosen_actions) for the batch. |
|
contextualized_actions |
Tensor | None
|
ActionInputType - Either a tensor of shape (batch_size, n_features) or a tuple of tensors. |
embedded_actions |
Tensor
|
Optional tensor of shape (batch_size, n_embedding_size), or None if not used. |
rewards |
Tensor | None
|
Tensor of shape (batch_size,). |
chosen_actions |
tuple[ActionInputType, Tensor | None, Tensor, Tensor | None]
|
Optional tensor of one-hot encoded chosen actions. Shape: (batch_size, n_actions). |
Raises:
Type | Description |
---|---|
ValueError
|
If requested batch_size is larger than available data. |
Source code in src/calvera/utils/data_storage.py
update_embeddings(embedded_actions)
abstractmethod
Update the embedded actions in the buffer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedded_actions
|
Tensor
|
New embeddings for all contexts in buffer. Shape: (buffer_size, n_embedding_size). |
required |
Source code in src/calvera/utils/data_storage.py
state_dict()
abstractmethod
Get state dictionary for checkpointing.
Returns:
Type | Description |
---|---|
StateDictType
|
Dictionary containing all necessary state information for restoring the buffer. |
load_state_dict(state_dict)
abstractmethod
Load state from checkpoint dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
StateDictType
|
Dictionary containing state information for restoring the buffer. |
required |
Source code in src/calvera/utils/data_storage.py
TensorDataBuffer(retrieval_strategy, max_size=None, device=None)
Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]
In-memory implementation of bandit data buffer.
Known limitations:
- It can't handle a varying amount of actions over time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieval_strategy
|
DataRetrievalStrategy
|
Strategy for managing training data selection. |
required |
max_size
|
int | None
|
Optional maximum number of samples to store. None means unlimited. |
None
|
device
|
device | None
|
Device to store data on (default: CPU). |
None
|
Source code in src/calvera/utils/data_storage.py
ListDataBuffer(retrieval_strategy, max_size=None)
Bases: AbstractBanditDataBuffer[ActionInputType, BanditStateDict]
A list-based implementation of the bandit data buffer.
This implementation stores contextualized actions, optional embedded actions, rewards and
chosen_actions in Python lists. torch.Tensors
are not concatenated but stored as lists.
Stores the torch.Tensors
without modifying their location (device).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieval_strategy
|
DataRetrievalStrategy
|
Strategy for selecting training samples. |
required |
max_size
|
int | None
|
Optional maximum number of samples to store. |
None
|
Source code in src/calvera/utils/data_storage.py
Strategies
DataRetrievalStrategy
Bases: Protocol
Protocol defining how training data should be managed in the buffer.
This protocol represents a strategy for determining which data points from the buffer should be used during training. Different implementations can select data in various ways (e.g., all data, most recent data, etc.).
get_training_indices(total_samples)
Get indices of data points to use for training.
For the TensorDataBuffer
this has to be deterministic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
total_samples
|
int
|
Total number of samples in the buffer. |
required |
Returns:
Name | Type | Description |
---|---|---|
Tensor
|
Tensor of indices to use for training. |
|
Shape |
Tensor
|
(n_selected_samples,). |
Source code in src/calvera/utils/data_storage.py
SlidingWindowRetrievalStrategy(window_size)
Bases: DataRetrievalStrategy
Strategy that uses only the last n data points from the buffer for training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_size
|
int
|
Number of most recent samples to use for training. |
required |
Source code in src/calvera/utils/data_storage.py
AllDataRetrievalStrategy
Bases: DataRetrievalStrategy
Strategy that uses all available data points in the buffer for training.