Benchmark
The benchmark is a collection of scripts that can be used to evaluate the performance of a bandit algorithm.
Datasets
A dataset implements the AbstractDataset
class. There are currently 6 datasets for the benchmark:
-
CovertypeDataset
- classification of forest cover types -
ImdbMovieReviews
- sentiment classification of movie reviews -
MNIST
- classification of 28x28 images of digits -
MovieLens
- recommendation of movies -
Statlog (Shuttle)
- classification of different modes of the space shuttle -
Tiny ImageNet
- more difficult image classification task for large image networks. -
Wheel
- synthetic dataset described here
AbstractDataset(needs_disjoint_contextualization=False)
Bases: ABC
, Generic[ActionInputType]
, Dataset[tuple[ActionInputType, Tensor]]
Abstract class for a dataset that is derived from PyTorch's Dataset class.
Additionally, it provides a reward method for the specific bandit setting.
Subclasses should have the following attributes:
-
num_actions
: The maximum number of actions available to the agent. -
context_size
: The standard size of the context vector. Ifneeds_disjoint_contextualization
isTrue
, the number of features should be multiplied by the number of actions.
ActionInputType Generic
The type of the contextualized actions that are input to the bandit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
needs_disjoint_contextualization
|
bool
|
Whether the dataset needs disjoint contextualization. |
False
|
Source code in src/calvera/benchmark/datasets/abstract_dataset.py
__getitem__(idx)
abstractmethod
Retrieve the item and the associated rewards for a given index.
Returns:
Type | Description |
---|---|
tuple[ActionInputType, Tensor]
|
A tuple containing the item and the rewards of the different actions. |
Source code in src/calvera/benchmark/datasets/abstract_dataset.py
__len__()
abstractmethod
CovertypeDataset(dest_path='./data')
Bases: AbstractDataset[Tensor]
Loads the Covertype dataset as a PyTorch Dataset from the UCI repository.
More information can be found at https://archive.ics.uci.edu/ml/datasets/covertype.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
Where to store and look for the dataset. |
'./data'
|
Source code in src/calvera/benchmark/datasets/covertype.py
__getitem__(idx)
Return the contextualized actions and rewards for a given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
contextualized_actions |
Tensor
|
The contextualized actions for the given index. |
rewards |
Tensor
|
The rewards for each action. Retrieved via |
Source code in src/calvera/benchmark/datasets/covertype.py
reward(idx, action)
Return the reward for a given index and action.
1.0 if the action is the correct cover type, 0.0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
action
|
int
|
The action to evaluate. |
required |
Returns:
Type | Description |
---|---|
float
|
1.0 if the action is the correct cover type, 0.0 otherwise. |
Source code in src/calvera/benchmark/datasets/covertype.py
ImdbMovieReviews(dest_path='./data', partition='train', max_len=255, tokenizer=None)
Bases: AbstractDataset[TextActionInputType]
A dataset for the IMDB movie reviews sentiment classification task.
More information can be found at https://ai.stanford.edu/~amaas/data/sentiment/.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory. |
'./data'
|
partition
|
Literal['train', 'test']
|
The partition of the dataset to use. Either "train" or "test". |
'train'
|
max_len
|
int
|
The maximum length of the input text. If the text is longer than this, it will be truncated. If it is
shorter, it will be padded. This is also the |
255
|
tokenizer
|
PreTrainedTokenizer | None
|
A tokenizer from the |
None
|
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory. |
'./data'
|
partition
|
Literal['train', 'test']
|
The partition of the dataset to use. Either "train" or "test". |
'train'
|
max_len
|
int
|
The maximum length of the input text. If the text is longer than this, it will be truncated. |
255
|
tokenizer
|
PreTrainedTokenizer | None
|
A tokenizer from the |
None
|
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
__getitem__(idx)
Return the input and reward for the given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the sample to retrieve. |
required |
Returns:
Type | Description |
---|---|
TextActionInputType
|
A tuple containing the necessary input for a model from the |
Tensor
|
Specifically, the input is a tuple containing the |
tuple[TextActionInputType, Tensor]
|
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
reward(idx, action)
Return the reward for the given index and action.
1.0 if the action is the correct sentiment, 0.0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the sample. |
required |
action
|
int
|
The action to evaluate. |
required |
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
MNISTDataset(dest_path='./data')
Bases: AbstractDataset[Tensor]
Loads the MNIST 784 (version=1) dataset as a PyTorch Dataset.
More information can be found at https://www.openml.org/search?type=data&status=active&id=554.
Loads the dataset from OpenML and stores it as PyTorch tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
Where to store the dataset |
'./data'
|
Source code in src/calvera/benchmark/datasets/mnist.py
__getitem__(idx)
Return the contextualized actions and rewards for a given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
contextualized_actions |
Tensor
|
The contextualized actions for the given index. |
rewards |
Tensor
|
The rewards for each action. Retrieved via |
Source code in src/calvera/benchmark/datasets/mnist.py
reward(idx, action)
Return the reward for a given index and action.
1.0 if the action is the same as the label, 0.0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
action
|
int
|
The action for which the reward is requested. |
required |
Source code in src/calvera/benchmark/datasets/mnist.py
MovieLensDataset(dest_path='./data', svd_rank=20, outer_product=True, k=4, L=200, min_movies=10, version='ml-latest-small', store_features=True)
Bases: AbstractDataset[Tensor]
MovieLens dataset for combinatorial contextual bandits.
The dataset is provided by the GroupLens Research specifically by Harper and Konstan (2015, The MovieLens Datasets: History and Context). It contains ratings of movies by different users. We do not use the ratings directly here but only the information that a user has rated and therefore watched this movie. More information can be found at https://www.grouplens.org/datasets/movielens/. We build the context by using the SVD decomposition of the user-movie matrix. The context is the outer product of the user and movie features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
The directory where the dataset is / will be stored. |
'./data'
|
svd_rank
|
int
|
Rank (number of latent dimensions) for the SVD decomposition. |
20
|
outer_product
|
bool
|
Whether to use the outer product of the user and movie features as the context. If |
True
|
k
|
int
|
The number of movies to exclude per user. |
4
|
L
|
int
|
The number of movies to include in the dataset. (Top |
200
|
min_movies
|
int
|
The minimum number of movies a user must have rated to be included in the dataset (after only
taking the top |
10
|
version
|
Literal['ml-latest-small', 'ml-32m']
|
The version of the MovieLens dataset to use. Either "ml-latest-small" or "ml-32m". |
'ml-latest-small'
|
store_features
|
bool
|
Whether to store the user and movie features.
If |
True
|
Source code in src/calvera/benchmark/datasets/movie_lens.py
__getitem__(idx)
Return the contextualized actions and rewards for a given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
contextualized_actions |
Tensor
|
The contextualized actions for the given index. |
rewards |
Tensor
|
The rewards for each action. Retrieved via |
Source code in src/calvera/benchmark/datasets/movie_lens.py
reward(idx, action)
Return the reward for a given index and action.
Returns 1 if the action is in the future, 0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
action
|
int
|
The action for which the reward is requested. |
required |
Source code in src/calvera/benchmark/datasets/movie_lens.py
StatlogDataset()
Bases: AbstractDataset[Tensor]
Loads the Statlog (Shuttle) dataset as a PyTorch Dataset from the UCI repository.
More information can be found at https://archive.ics.uci.edu/dataset/148/statlog+shuttle.
Loads the dataset from the UCI repository and stores it as PyTorch tensors.
Source code in src/calvera/benchmark/datasets/statlog.py
__getitem__(idx)
Return the contextualized actions and rewards for a given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
contextualized_actions |
Tensor
|
The contextualized actions for the given index. |
rewards |
Tensor
|
The rewards for each action. Retrieved via |
Source code in src/calvera/benchmark/datasets/statlog.py
reward(idx, action)
Return the reward for a given index and action.
Returns 1 if the action is the same as the label, 0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
action
|
int
|
The action for which the reward is requested. |
required |
Source code in src/calvera/benchmark/datasets/statlog.py
TinyImageNetDataset(dest_path='./data', split='train', max_classes=10)
Bases: AbstractDataset[Tensor]
Loads the Tiny ImageNet dataset as a PyTorch Dataset.
More information can be found at https://cs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf.
Tiny ImageNet has 200 classes with 500 training images, 50 validation images, and 50 test images per class. Each image is 64x64 pixels in 3 channels (RGB).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_path
|
str
|
The directory where the dataset will be stored. |
'./data'
|
split
|
Literal['train', 'val', 'test']
|
Which split to use ('train', 'val', or 'test') |
'train'
|
max_classes
|
int
|
The maximum number of classes to use from the dataset. Default is 10. |
10
|
Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
__getitem__(idx)
Return the context (flattened image) and rewards for a given index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
context |
Tensor
|
The flattened image features as the context. |
rewards |
Tensor
|
The rewards for each action (1.0 for correct class, 0.0 otherwise). |
Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
reward(idx, action)
Return the reward for a given index and action.
1.0 if the action is the same as the label, 0.0 otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
action
|
int
|
The action for which the reward is requested. |
required |
Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
WheelBanditDataset(num_samples, delta, mu_small=1.0, std_small=0.01, mu_medium=1.2, std_medium=0.01, mu_large=50.0, std_large=0.01, seed=None)
Bases: AbstractDataset[Tensor]
Generates a dataset for the Wheel Bandit problem.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples
|
int
|
Number of samples to generate. |
required |
delta
|
float
|
Exploration parameter: high reward in one region if norm above delta |
required |
mu_small
|
float
|
Mean of the small reward distribution. |
1.0
|
std_small
|
float
|
Standard deviation of the small reward distribution. |
0.01
|
mu_medium
|
float
|
Mean of the medium reward distribution. |
1.2
|
std_medium
|
float
|
Standard deviation of the medium reward distribution. |
0.01
|
mu_large
|
float
|
Mean of the large reward distribution. |
50.0
|
std_large
|
float
|
Standard deviation of the large reward distribution. |
0.01
|
seed
|
int | None
|
Seed for the random number generator. |
None
|
Source code in src/calvera/benchmark/datasets/wheel.py
__getitem__(idx)
Return the contextualized actions and rewards for the context at index idx in this dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx
|
int
|
The index of the context in this dataset. |
required |
Source code in src/calvera/benchmark/datasets/wheel.py
reward(idx, action)
Return the reward of the given action for the context at index idx in this dataset.
Environment
BanditBenchmarkEnvironment(dataloader, device=None)
Bases: Generic[ActionInputType]
Environment that iterates over a DataLoader, yielding only contextualized_actions
.
Internally stores rewards
, which can be retrieved by a helper method.
This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions and
not the rewards.
Internally stores rewards
, which can be retrieved by a helper method.
This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions
and not the rewards. The bandit should first sample contextualized_actions
by iterating over the environment.
The bandit can then choose the best actions.
Finally, the bandit can receive rewards by calling get_rewards_dataset(chosen_actions)
.
Since this is a simulation, the bandit can also compute the regret by calling compute_regret(chosen_actions)
.
Usage:
environment = BanditBenchmarkEnvironment(dataloader)
for contextualized_actions in environment:
chosen_actions, p = bandit.forward(contextualized_actions) # one-hot tensor
chosen_contextualized_actions, realized_rewards = environment.get_feedback(chosen_actions)
bandit.record_feedback(chosen_contextualized_actions, realized_rewards)
# optional: compute regret
regret = environment.compute_regret(chosen_actions)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataloader
|
DataLoader[tuple[ActionInputType, Tensor]]
|
DataLoader that yields batches of (contextualized_actions, all_rewards) tuples. |
required |
device
|
device | None
|
The device the tensors should be moved to. If None, the default device is used. |
None
|
Source code in src/calvera/benchmark/environment.py
__iter__()
Returns an iterator object for the BanditBenchmarkEnvironment.
This method initializes an iterator for the dataloader and returns the BanditBenchmarkEnvironment instance itself, allowing it to be used as an iterator in a loop. Needs to be called before the first iteration.
Returns:
Name | Type | Description |
---|---|---|
BanditBenchmarkEnvironment |
BanditBenchmarkEnvironment[ActionInputType]
|
The instance of the environment itself. |
Source code in src/calvera/benchmark/environment.py
__next__()
Returns the next batch of contextualized actions from the DataLoader.
Returns:
Type | Description |
---|---|
ActionInputType
|
The contextualized actions for the bandit to pick from. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the iterator is not initialized with |
Source code in src/calvera/benchmark/environment.py
get_feedback(chosen_actions)
Returns the chosen actions & realized rewards of the last batch.
For combinatorial bandits, this feedback is semi-bandit feedback.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chosen_actions
|
Tensor
|
shape (n, m) (one-hot, possibly multiple "1"s). The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows. |
required |
Returns:
Type | Description |
---|---|
tuple[ActionInputType, Tensor]
|
BanditFeedbackDataset with the chosen actions (shape: (n, m, k)) and realized rewards (shape: (n, m)). |
Source code in src/calvera/benchmark/environment.py
compute_regret(chosen_actions)
Computes the regret for the most recent batch.
Definition
best_reward = max over top i actions (where i is the number of chosen actions) chosen_reward = sum over chosen actions (handles multiple 1s per row) regret = best_reward - chosen_reward
Important: For combinatorial bandits assumes that the reward of a super-action is the sum of each chosen arm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chosen_actions
|
Tensor
|
shape (n, k), one-hot, possibly multiple "1"s. The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Tensor of regrets shape (n, ). |