Benchmark

The benchmark is a collection of scripts that can be used to evaluate the performance of a bandit algorithm.

Datasets

A dataset implements the AbstractDataset class. There are currently 6 datasets for the benchmark:

CovertypeDataset - classification of forest cover types
ImdbMovieReviews - sentiment classification of movie reviews
MNIST - classification of 28x28 images of digits
MovieLens - recommendation of movies
Statlog (Shuttle) - classification of different modes of the space shuttle
Tiny ImageNet - more difficult image classification task for large image networks.
Wheel - synthetic dataset described here

`AbstractDataset(needs_disjoint_contextualization=False)`

Bases: ABC, Generic[ActionInputType], Dataset[tuple[ActionInputType, Tensor]]

Abstract class for a dataset that is derived from PyTorch's Dataset class.

Additionally, it provides a reward method for the specific bandit setting.

Subclasses should have the following attributes:

num_actions: The maximum number of actions available to the agent.
context_size: The standard size of the context vector. If needs_disjoint_contextualization is True, the number of features should be multiplied by the number of actions.

ActionInputType Generic

The type of the contextualized actions that are input to the bandit.

Parameters:

Name	Type	Description	Default
`needs_disjoint_contextualization`	`bool`	Whether the dataset needs disjoint contextualization.	`False`

Source code in src/calvera/benchmark/datasets/abstract_dataset.py

def __init__(self, needs_disjoint_contextualization: bool = False) -> None:
    """Initialize the dataset.

    Args:
        needs_disjoint_contextualization: Whether the dataset needs disjoint contextualization.
    """
    self.contextualizer: MultiClassContextualizer | Callable[[Any], Any]
    if needs_disjoint_contextualization:
        self.contextualizer = MultiClassContextualizer(self.num_actions)
    else:
        self.contextualizer = lambda x: x

`getitem(idx)` `abstractmethod`

Retrieve the item and the associated rewards for a given index.

Returns:

Type	Description
`tuple[ActionInputType, Tensor]`	A tuple containing the item and the rewards of the different actions.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py

@abstractmethod
def __getitem__(self, idx: int) -> tuple[ActionInputType, torch.Tensor]:
    """Retrieve the item and the associated rewards for a given index.

    Returns:
        A tuple containing the item and the rewards of the different actions.
    """
    pass

`len()` `abstractmethod`

Return the number of contexts/samples in this dataset.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py

@abstractmethod
def __len__(self) -> int:
    """Return the number of contexts/samples in this dataset."""
    pass

`reward(idx, action)` `abstractmethod`

Returns the reward for a given index and action.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py

@abstractmethod
def reward(self, idx: int, action: int) -> float:
    """Returns the reward for a given index and action."""
    pass

`CovertypeDataset(dest_path='./data')`

Bases: AbstractDataset[Tensor]

Loads the Covertype dataset as a PyTorch Dataset from the UCI repository.

More information can be found at https://archive.ics.uci.edu/ml/datasets/covertype.

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	Where to store and look for the dataset.	`'./data'`

Source code in src/calvera/benchmark/datasets/covertype.py

def __init__(self, dest_path: str = "./data") -> None:
    """Initialize the Covertype dataset. Downloads the dataset from UCI repository if not found at `dest_path`.

    Args:
        dest_path: Where to store and look for the dataset.
    """
    super().__init__(needs_disjoint_contextualization=True)
    self.data = fetch_covtype(data_home=dest_path)
    X_np = self.data.data.astype(np.float32)
    y_np = self.data.target.astype(np.int64)

    self.X = torch.tensor(X_np, dtype=torch.float32)
    self.y = torch.tensor(y_np, dtype=torch.long)

`getitem(idx)`

Return the contextualized actions and rewards for a given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Returns:

Name	Type	Description
`contextualized_actions`	`Tensor`	The contextualized actions for the given index.
`rewards`	`Tensor`	The rewards for each action. Retrieved via `self.reward`.

Source code in src/calvera/benchmark/datasets/covertype.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    context = self.X[idx].reshape(1, -1)
    contextualized_actions = self.contextualizer(context).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

`reward(idx, action)`

Return the reward for a given index and action.

1.0 if the action is the correct cover type, 0.0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required
`action`	`int`	The action to evaluate.	required

Returns:

Type	Description
`float`	1.0 if the action is the correct cover type, 0.0 otherwise.

Source code in src/calvera/benchmark/datasets/covertype.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the correct cover type, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action to evaluate.

    Returns:
        1.0 if the action is the correct cover type, 0.0 otherwise.
    """
    return float(self.y[idx] == action + 1)

`ImdbMovieReviews(dest_path='./data', partition='train', max_len=255, tokenizer=None)`

Bases: AbstractDataset[TextActionInputType]

A dataset for the IMDB movie reviews sentiment classification task.

More information can be found at https://ai.stanford.edu/~amaas/data/sentiment/.

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory.	`'./data'`
`partition`	`Literal['train', 'test']`	The partition of the dataset to use. Either "train" or "test".	`'train'`
`max_len`	`int`	The maximum length of the input text. If the text is longer than this, it will be truncated. If it is shorter, it will be padded. This is also the `context_size` of the dataset.	`255`
`tokenizer`	`PreTrainedTokenizer \| None`	A tokenizer from the `transformers` library. If None, the `BertTokenizer` will be used.	`None`

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory.	`'./data'`
`partition`	`Literal['train', 'test']`	The partition of the dataset to use. Either "train" or "test".	`'train'`
`max_len`	`int`	The maximum length of the input text. If the text is longer than this, it will be truncated.	`255`
`tokenizer`	`PreTrainedTokenizer \| None`	A tokenizer from the `transformers` library. If None, the `BertTokenizer` will be used.	`None`

Source code in src/calvera/benchmark/datasets/imdb_reviews.py

def __init__(
    self,
    dest_path: str = "./data",
    partition: Literal["train", "test"] = "train",
    max_len: int = 255,
    tokenizer: PreTrainedTokenizer | None = None,
):
    """Initialize the IMDB movie reviews dataset.

    Args:
        dest_path: The path to the directory where the dataset is stored. If None, the dataset will be downloaded
            to the current directory.
        partition: The partition of the dataset to use. Either "train" or "test".
        max_len: The maximum length of the input text. If the text is longer than this, it will be truncated.
        tokenizer: A tokenizer from the `transformers` library. If None, the `BertTokenizer` will be used.
    """
    # Using disjoint contextualization for this dataset does not work. We have a sequence of tokens.
    super().__init__(needs_disjoint_contextualization=False)

    self.data = _setup_dataset(
        partition=partition,
        dest_path=dest_path,
    )

    if tokenizer is None:
        self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", padding="max_length", truncation=True)

`getitem(idx)`

Return the input and reward for the given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the sample to retrieve.	required

Returns:

Type	Description
`TextActionInputType`	A tuple containing the necessary input for a model from the `transformers` library and the reward.
`Tensor`	Specifically, the input is a tuple containing the `input_ids`, `attention_mask`, and `token_type_ids`.
`tuple[TextActionInputType, Tensor]`	(cmp. https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.call)

Source code in src/calvera/benchmark/datasets/imdb_reviews.py

def __getitem__(self, idx: int) -> tuple[TextActionInputType, torch.Tensor]:
    """Return the input and reward for the given index.

    Args:
        idx: The index of the sample to retrieve.

    Returns:
        A tuple containing the necessary input for a model from the `transformers` library and the reward.
        Specifically, the input is a tuple containing the `input_ids`, `attention_mask`, and `token_type_ids`.
        (cmp. [https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__](https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__))
    """
    inputs = self.tokenizer(
        self.data["text"][idx],
        None,
        add_special_tokens=True,
        max_length=self.context_size,
        padding="max_length",
        truncation=True,
        return_token_type_ids=True,
    )

    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float,
    )

    return (
        (
            torch.tensor(inputs["input_ids"], dtype=torch.long).unsqueeze(0),
            torch.tensor(inputs["attention_mask"], dtype=torch.long).unsqueeze(0),
            torch.tensor(inputs["token_type_ids"], dtype=torch.long).unsqueeze(0),
        ),
        rewards,
    )

`reward(idx, action)`

Return the reward for the given index and action.

1.0 if the action is the correct sentiment, 0.0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the sample.	required
`action`	`int`	The action to evaluate.	required

Source code in src/calvera/benchmark/datasets/imdb_reviews.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for the given index and action.

    1.0 if the action is the correct sentiment, 0.0 otherwise.

    Args:
        idx: The index of the sample.
        action: The action to evaluate.
    """
    return 1.0 if action == self.data["sentiment"][idx] else 0.0

`MNISTDataset(dest_path='./data')`

Bases: AbstractDataset[Tensor]

Loads the MNIST 784 (version=1) dataset as a PyTorch Dataset.

More information can be found at https://www.openml.org/search?type=data&status=active&id=554.

Loads the dataset from OpenML and stores it as PyTorch tensors.

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	Where to store the dataset	`'./data'`

Source code in src/calvera/benchmark/datasets/mnist.py

def __init__(self, dest_path: str = "./data") -> None:
    """Initialize the MNIST 784 dataset.

    Loads the dataset from OpenML and stores it as PyTorch tensors.

    Args:
        dest_path: Where to store the dataset
    """
    super().__init__(needs_disjoint_contextualization=True)
    self.data: Bunch = fetch_openml(
        name="mnist_784",
        version=1,
        data_home=dest_path,
        as_frame=False,
    )
    self.X = self.data.data.astype(np.float32)
    self.y = self.data.target.astype(np.int64)

`getitem(idx)`

Return the contextualized actions and rewards for a given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Returns:

Name	Type	Description
`contextualized_actions`	`Tensor`	The contextualized actions for the given index.
`rewards`	`Tensor`	The rewards for each action. Retrieved via `self.reward`.

Source code in src/calvera/benchmark/datasets/mnist.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    X_item = torch.tensor(self.X[idx], dtype=torch.float32).unsqueeze(0)
    contextualized_actions = self.contextualizer(X_item).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

`reward(idx, action)`

Return the reward for a given index and action.

1.0 if the action is the same as the label, 0.0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required
`action`	`int`	The action for which the reward is requested.	required

Source code in src/calvera/benchmark/datasets/mnist.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the same as the label, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action)

`MovieLensDataset(dest_path='./data', svd_rank=20, outer_product=True, k=4, L=200, min_movies=10, version='ml-latest-small', store_features=True)`

Bases: AbstractDataset[Tensor]

MovieLens dataset for combinatorial contextual bandits.

The dataset is provided by the GroupLens Research specifically by Harper and Konstan (2015, The MovieLens Datasets: History and Context). It contains ratings of movies by different users. We do not use the ratings directly here but only the information that a user has rated and therefore watched this movie. More information can be found at https://www.grouplens.org/datasets/movielens/. We build the context by using the SVD decomposition of the user-movie matrix. The context is the outer product of the user and movie features.

References

Li et al. "A contextual-bandit approach to personalized news article recommendation"

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	The directory where the dataset is / will be stored.	`'./data'`
`svd_rank`	`int`	Rank (number of latent dimensions) for the SVD decomposition.	`20`
`outer_product`	`bool`	Whether to use the outer product of the user and movie features as the context. If `False`, the context will be the concatenation of the user and movie features. (Might perform better for Neural Bandits).	`True`
`k`	`int`	The number of movies to exclude per user.	`4`
`L`	`int`	The number of movies to include in the dataset. (Top `L` most common movies).	`200`
`min_movies`	`int`	The minimum number of movies a user must have rated to be included in the dataset (after only taking the top `L` movies).	`10`
`version`	`Literal['ml-latest-small', 'ml-32m']`	The version of the MovieLens dataset to use. Either "ml-latest-small" or "ml-32m".	`'ml-latest-small'`
`store_features`	`bool`	Whether to store the user and movie features. If `True`, the features will be stored in `dest_path`.	`True`

Source code in src/calvera/benchmark/datasets/movie_lens.py

def __init__(
    self,
    dest_path: str = "./data",
    svd_rank: int = 20,
    outer_product: bool = True,
    k: int = 4,
    L: int = 200,
    min_movies: int = 10,
    version: Literal["ml-latest-small", "ml-32m"] = "ml-latest-small",
    store_features: bool = True,
):
    """Initialize the MovieLens dataset.

    Args:
        dest_path: The directory where the dataset is / will be stored.
        svd_rank: Rank (number of latent dimensions) for the SVD decomposition.
        outer_product: Whether to use the outer product of the user and movie features as the context. If `False`,
            the context will be the concatenation of the user and movie features.
            (Might perform better for Neural Bandits).
        k: The number of movies to exclude per user.
        L: The number of movies to include in the dataset. (Top `L` most common movies).
        min_movies: The minimum number of movies a user must have rated to be included in the dataset (after only
            taking the top `L` movies).
        version: The version of the MovieLens dataset to use. Either "ml-latest-small" or "ml-32m".
        store_features: Whether to store the user and movie features.
            If `True`, the features will be stored in `dest_path`.
    """
    super().__init__(needs_disjoint_contextualization=False)
    self.user_features, self.movie_features, self.history, self.F = _setup_movielens(
        dest_path=dest_path,
        svd_rank=svd_rank,
        k=k,
        L=L,
        min_movies=min_movies,
        version=version,
        store_features=store_features,
    )
    self.outer_product = outer_product

    # We can predict k movies per user. The idea is that we only predict a user once.
    self.num_actions = self.history.shape[-1]
    self.num_samples = self.user_features.shape[0]
    self.context_size = (
        self.user_features.shape[-1] * self.movie_features.shape[-1]
        if self.outer_product
        else self.user_features.shape[-1] + self.movie_features.shape[-1]
    )

`getitem(idx)`

Return the contextualized actions and rewards for a given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Returns:

Name	Type	Description
`contextualized_actions`	`Tensor`	The contextualized actions for the given index.
`rewards`	`Tensor`	The rewards for each action. Retrieved via `self.reward`.

Source code in src/calvera/benchmark/datasets/movie_lens.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    # Get avaiable actions (1 - history[userId - 1 = idx])
    available_actions = (1.0 - self.history[idx]).bool()

    # Get the context for each action
    contexts: torch.Tensor

    if self.outer_product:
        contexts = self.user_features[idx].unsqueeze(-1) * self.movie_features.unsqueeze(1)
        contexts = contexts.flatten(1)
    else:
        contexts = torch.cat(
            (
                self.user_features[idx].unsqueeze(0).expand(self.movie_features.size(0), -1),
                self.movie_features,
            ),
            dim=-1,
        )

    return contexts, torch.tensor(
        [
            self.reward(idx, movie_idx)
            for movie_idx in range(self.history.shape[-1])
            if available_actions[movie_idx]
        ],
        dtype=torch.float32,
    )

`reward(idx, action)`

Return the reward for a given index and action.

Returns 1 if the action is in the future, 0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required
`action`	`int`	The action for which the reward is requested.	required

Source code in src/calvera/benchmark/datasets/movie_lens.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    Returns 1 if the action is in the future, 0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    # An idx represents a user and the action is a movie.
    return self.F[idx, action].item()

`StatlogDataset()`

Bases: AbstractDataset[Tensor]

Loads the Statlog (Shuttle) dataset as a PyTorch Dataset from the UCI repository.

More information can be found at https://archive.ics.uci.edu/dataset/148/statlog+shuttle.

Loads the dataset from the UCI repository and stores it as PyTorch tensors.

Source code in src/calvera/benchmark/datasets/statlog.py

def __init__(self) -> None:
    """Initialize the Statlog (Shuttle) dataset.

    Loads the dataset from the UCI repository and stores it as PyTorch tensors.
    """
    super().__init__(needs_disjoint_contextualization=True)
    dataset = fetch_ucirepo(id=148)  # id=148 specifies the Statlog (Shuttle) dataset
    X = dataset.data.features
    y = dataset.data.targets

    self.X = torch.tensor(X.values, dtype=torch.float32)
    self.y = torch.tensor(y.values, dtype=torch.long)

`getitem(idx)`

Return the contextualized actions and rewards for a given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Returns:

Name	Type	Description
`contextualized_actions`	`Tensor`	The contextualized actions for the given index.
`rewards`	`Tensor`	The rewards for each action. Retrieved via `self.reward`.

Source code in src/calvera/benchmark/datasets/statlog.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    contextualized_actions = self.contextualizer(self.X[idx].unsqueeze(0)).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

`reward(idx, action)`

Return the reward for a given index and action.

Returns 1 if the action is the same as the label, 0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required
`action`	`int`	The action for which the reward is requested.	required

Source code in src/calvera/benchmark/datasets/statlog.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    Returns 1 if the action is the same as the label, 0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action + 1)

`TinyImageNetDataset(dest_path='./data', split='train', max_classes=10)`

Bases: AbstractDataset[Tensor]

Loads the Tiny ImageNet dataset as a PyTorch Dataset.

More information can be found at https://cs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf.

Tiny ImageNet has 200 classes with 500 training images, 50 validation images, and 50 test images per class. Each image is 64x64 pixels in 3 channels (RGB).

Parameters:

Name	Type	Description	Default
`dest_path`	`str`	The directory where the dataset will be stored.	`'./data'`
`split`	`Literal['train', 'val', 'test']`	Which split to use ('train', 'val', or 'test')	`'train'`
`max_classes`	`int`	The maximum number of classes to use from the dataset. Default is 10.	`10`

Source code in src/calvera/benchmark/datasets/tiny_imagenet.py

def __init__(
    self,
    dest_path: str = "./data",
    split: Literal["train", "val", "test"] = "train",
    max_classes: int = 10,
) -> None:
    """Initialize the Tiny ImageNet dataset.

    Args:
        dest_path: The directory where the dataset will be stored.
        split: Which split to use ('train', 'val', or 'test')
        max_classes: The maximum number of classes to use from the dataset. Default is 10.
    """
    super().__init__(needs_disjoint_contextualization=False)

    self.dest_path = dest_path
    self.split = split

    self.image_dataset, self.y = _setup_tinyimagenet(
        dest_path=dest_path,
        split=split,
    )

    if max_classes < 200:
        self.image_dataset.classes = self.image_dataset.classes[:max_classes]
        self.image_dataset.class_to_idx = {c: i for i, c in enumerate(self.image_dataset.classes)}
        # filter out samples from classes not in the first `max_classes`
        self.image_dataset.samples = [
            (path, label) for path, label in self.image_dataset.samples if label < max_classes
        ]
        self.y = self.y[self.y < max_classes]
        self.num_actions = max_classes

    self.idx_to_class = {v: k for k, v in self.image_dataset.class_to_idx.items()}

    self.X = None

`getitem(idx)`

Return the context (flattened image) and rewards for a given index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Returns:

Name	Type	Description
`context`	`Tensor`	The flattened image features as the context.
`rewards`	`Tensor`	The rewards for each action (1.0 for correct class, 0.0 otherwise).

Source code in src/calvera/benchmark/datasets/tiny_imagenet.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the context (flattened image) and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        context: The flattened image features as the context.
        rewards: The rewards for each action (1.0 for correct class, 0.0 otherwise).
    """
    image_tensor, _ = self.image_dataset[idx]

    # Flatten the image tensor from (C, H, W) to (C*H*W)
    context = image_tensor.view(1, -1)  # shape: (1, 3*64*64)

    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return context, rewards

`reward(idx, action)`

Return the reward for a given index and action.

1.0 if the action is the same as the label, 0.0 otherwise.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required
`action`	`int`	The action for which the reward is requested.	required

Source code in src/calvera/benchmark/datasets/tiny_imagenet.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the same as the label, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action)

`WheelBanditDataset(num_samples, delta, mu_small=1.0, std_small=0.01, mu_medium=1.2, std_medium=0.01, mu_large=50.0, std_large=0.01, seed=None)`

Bases: AbstractDataset[Tensor]

Generates a dataset for the Wheel Bandit problem.

References

Riquelme et al. "Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling"

Parameters:

Name	Type	Description	Default
`num_samples`	`int`	Number of samples to generate.	required
`delta`	`float`	Exploration parameter: high reward in one region if norm above delta	required
`mu_small`	`float`	Mean of the small reward distribution.	`1.0`
`std_small`	`float`	Standard deviation of the small reward distribution.	`0.01`
`mu_medium`	`float`	Mean of the medium reward distribution.	`1.2`
`std_medium`	`float`	Standard deviation of the medium reward distribution.	`0.01`
`mu_large`	`float`	Mean of the large reward distribution.	`50.0`
`std_large`	`float`	Standard deviation of the large reward distribution.	`0.01`
`seed`	`int \| None`	Seed for the random number generator.	`None`

Source code in src/calvera/benchmark/datasets/wheel.py

def __init__(
    self,
    num_samples: int,
    delta: float,
    mu_small: float = 1.0,
    std_small: float = 0.01,
    mu_medium: float = 1.2,
    std_medium: float = 0.01,
    mu_large: float = 50.0,
    std_large: float = 0.01,
    seed: int | None = None,
) -> None:
    """Initialize the Wheel Bandit dataset.

    Args:
        num_samples: Number of samples to generate.
        delta: Exploration parameter: high reward in one region if norm above delta
        mu_small: Mean of the small reward distribution.
        std_small: Standard deviation of the small reward distribution.
        mu_medium: Mean of the medium reward distribution.
        std_medium: Standard deviation of the medium reward distribution.
        mu_large: Mean of the large reward distribution.
        std_large: Standard deviation of the large reward distribution.
        seed: Seed for the random number generator.
    """
    super().__init__(needs_disjoint_contextualization=True)

    self.num_samples = num_samples
    self.delta = delta

    # Reward distributions
    self.mu_small = mu_small
    self.std_small = std_small
    self.mu_medium = mu_medium
    self.std_medium = std_medium
    self.mu_large = mu_large
    self.std_large = std_large

    data, rewards = self._generate_data(seed)
    self.data = data
    self.rewards = rewards

`getitem(idx)`

Return the contextualized actions and rewards for the context at index idx in this dataset.

Parameters:

Name	Type	Description	Default
`idx`	`int`	The index of the context in this dataset.	required

Source code in src/calvera/benchmark/datasets/wheel.py

def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for the context at index idx in this dataset.

    Args:
        idx: The index of the context in this dataset.
    """
    contextualized_actions = self.contextualizer(self.data[idx].unsqueeze(0)).squeeze(0)
    rewards = self.rewards[idx]

    return contextualized_actions, rewards

`reward(idx, action)`

Return the reward of the given action for the context at index idx in this dataset.

Source code in src/calvera/benchmark/datasets/wheel.py

def reward(self, idx: int, action: int) -> float:
    """Return the reward of the given action for the context at index idx in this dataset."""
    return self.rewards[idx, action].item()

Environment

`BanditBenchmarkEnvironment(dataloader, device=None)`

Bases: Generic[ActionInputType]

Environment that iterates over a DataLoader, yielding only contextualized_actions.

Internally stores rewards, which can be retrieved by a helper method. This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions and not the rewards.

Internally stores rewards, which can be retrieved by a helper method. This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions and not the rewards. The bandit should first sample contextualized_actions by iterating over the environment. The bandit can then choose the best actions. Finally, the bandit can receive rewards by calling get_rewards_dataset(chosen_actions). Since this is a simulation, the bandit can also compute the regret by calling compute_regret(chosen_actions).

Usage:

environment = BanditBenchmarkEnvironment(dataloader)
for contextualized_actions in environment:
    chosen_actions, p = bandit.forward(contextualized_actions)  # one-hot tensor
    chosen_contextualized_actions, realized_rewards = environment.get_feedback(chosen_actions)
    bandit.record_feedback(chosen_contextualized_actions, realized_rewards)

    # optional: compute regret
    regret = environment.compute_regret(chosen_actions)

Parameters:

Name	Type	Description	Default
`dataloader`	`DataLoader[tuple[ActionInputType, Tensor]]`	DataLoader that yields batches of (contextualized_actions, all_rewards) tuples.	required
`device`	`device \| None`	The device the tensors should be moved to. If None, the default device is used.	`None`

Source code in src/calvera/benchmark/environment.py

def __init__(
    self,
    dataloader: DataLoader[tuple[ActionInputType, torch.Tensor]],
    device: torch.device | None = None,
) -> None:
    """Initializes a BanditBenchmarkEnvironment.

    Args:
        dataloader: DataLoader that yields batches of (contextualized_actions, all_rewards) tuples.
        device: The device the tensors should be moved to. If None, the default device is used.
    """
    self._dataloader: DataLoader[tuple[ActionInputType, torch.Tensor]] = dataloader
    self._iterator: _BaseDataLoaderIter | None = None
    self._last_contextualized_actions: ActionInputType | None = None
    self._last_all_rewards: torch.Tensor | None = None
    self.device = device

`iter()`

Returns an iterator object for the BanditBenchmarkEnvironment.

This method initializes an iterator for the dataloader and returns the BanditBenchmarkEnvironment instance itself, allowing it to be used as an iterator in a loop. Needs to be called before the first iteration.

Returns:

Name	Type	Description
`BanditBenchmarkEnvironment`	`BanditBenchmarkEnvironment[ActionInputType]`	The instance of the environment itself.

Source code in src/calvera/benchmark/environment.py

def __iter__(self) -> "BanditBenchmarkEnvironment[ActionInputType]":
    """Returns an iterator object for the BanditBenchmarkEnvironment.

    This method initializes an iterator for the dataloader and returns the
    BanditBenchmarkEnvironment instance itself, allowing it to be used as an
    iterator in a loop. Needs to be called before the first iteration.

    Returns:
        BanditBenchmarkEnvironment: The instance of the environment itself.
    """
    self._iterator = iter(self._dataloader)
    return self

`next()`

Returns the next batch of contextualized actions from the DataLoader.

Returns:

Type	Description
`ActionInputType`	The contextualized actions for the bandit to pick from.

Raises:

Type	Description
`AssertionError`	If the iterator is not initialized with `__iter__`.

Source code in src/calvera/benchmark/environment.py

def __next__(self) -> ActionInputType:
    """Returns the next batch of contextualized actions from the DataLoader.

    Returns:
        The contextualized actions for the bandit to pick from.

    Raises:
        AssertionError: If the iterator is not initialized with `__iter__`.
    """
    assert self._iterator is not None, "No iterator was created."

    # Retrieve one batch from the DataLoader
    batch = next(self._iterator)
    contextualized_actions: ActionInputType = batch[0]
    all_rewards: torch.Tensor = batch[1].to(device=self.device)

    if isinstance(contextualized_actions, torch.Tensor):
        batch_size, num_actions = contextualized_actions.shape[:2]
        contextualized_actions = cast(ActionInputType, contextualized_actions.to(device=self.device))
    elif isinstance(contextualized_actions, tuple | list):
        contextualized_actions = cast(
            ActionInputType,
            tuple(action_tensor.to(device=self.device) for action_tensor in contextualized_actions),
        )
        batch_size, num_actions = contextualized_actions[0].shape[:2]
    else:
        raise ValueError(
            f"contextualized_actions must be a torch.Tensor or a tuple. Received {type(contextualized_actions)}."
        )

    assert batch_size == all_rewards.size(0), (
        f"Mismatched batch size of contextualized_actions and all_rewards tensors."
        f"Received {batch_size} and {all_rewards.size(0)}."
    )
    assert num_actions == all_rewards.size(1) or num_actions == 1, (
        f"Mismatched number of actions in contextualized_actions and all_rewards tensors."
        f"Received {num_actions} and {all_rewards.size(1)}."
    )

    # Store them so we can fetch them later when building the update dataset
    self._last_contextualized_actions = contextualized_actions
    self._last_all_rewards = all_rewards
    # Return only the contextualized actions for the bandit to pick from
    return contextualized_actions

`get_feedback(chosen_actions)`

Returns the chosen actions & realized rewards of the last batch.

For combinatorial bandits, this feedback is semi-bandit feedback.

Parameters:

Name	Type	Description	Default
`chosen_actions`	`Tensor`	shape (n, m) (one-hot, possibly multiple "1"s). The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows.	required

Returns:

Type	Description
`tuple[ActionInputType, Tensor]`	BanditFeedbackDataset with the chosen actions (shape: (n, m, k)) and realized rewards (shape: (n, m)).

Source code in src/calvera/benchmark/environment.py

def get_feedback(self, chosen_actions: torch.Tensor) -> tuple[ActionInputType, torch.Tensor]:
    """Returns the chosen actions & realized rewards of the last batch.

    For combinatorial bandits, this feedback is semi-bandit feedback.

    Args:
        chosen_actions: shape (n, m) (one-hot, possibly multiple "1"s). The actions chosen by the bandit. Must
            contain at least one and the same number of chosen actions ("1s") for all rows.

    Returns:
        BanditFeedbackDataset with the chosen actions (shape: (n, m, k)) and realized rewards (shape: (n, m)).
    """
    self._validate_chosen_actions(chosen_actions)

    chosen_contextualized_actions = self._get_chosen_contextualized_actions(chosen_actions)
    realized_rewards = self._get_realized_rewards(chosen_actions)

    return (
        chosen_contextualized_actions,
        realized_rewards,
    )

`compute_regret(chosen_actions)`

Computes the regret for the most recent batch.

Definition

best_reward = max over top i actions (where i is the number of chosen actions) chosen_reward = sum over chosen actions (handles multiple 1s per row) regret = best_reward - chosen_reward

Important: For combinatorial bandits assumes that the reward of a super-action is the sum of each chosen arm.

Parameters:

Name	Type	Description	Default
`chosen_actions`	`Tensor`	shape (n, k), one-hot, possibly multiple "1"s. The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows.	required

Returns:

Type	Description
`Tensor`	Tensor of regrets shape (n, ).

Source code in src/calvera/benchmark/environment.py

def compute_regret(self, chosen_actions: torch.Tensor) -> torch.Tensor:
    """Computes the regret for the most recent batch.

    Definition:
      best_reward = max over top i actions (where i is the number of chosen actions)
      chosen_reward = sum over chosen actions (handles multiple 1s per row)
      regret = best_reward - chosen_reward
    Important: For combinatorial bandits assumes that the reward of a super-action is the sum of each chosen arm.

    Args:
        chosen_actions: shape (n, k), one-hot, possibly multiple "1"s. The actions chosen by the bandit. Must
            contain at least one and the same number of chosen actions ("1s") for all rows.

    Returns:
        Tensor of regrets shape (n, ).
    """
    self._validate_chosen_actions(chosen_actions)

    best_action_rewards = self._get_best_action_rewards(chosen_actions).sum(dim=1)
    chosen_reward = self._get_realized_rewards(chosen_actions).sum(dim=1)

    eps = 1e-4
    assert torch.all(best_action_rewards >= chosen_reward - eps), (
        "Best action rewards should be greater than chosen rewards. "
        f"Best: {best_action_rewards}, Chosen: {chosen_reward}"
    )

    regret = best_action_rewards - chosen_reward
    # clamp to 0 to avoid negative regrets
    regret = torch.clamp(regret, min=0.0)
    return regret

Benchmark

Datasets

AbstractDataset(needs_disjoint_contextualization=False)

__getitem__(idx) abstractmethod

__len__() abstractmethod

reward(idx, action) abstractmethod

CovertypeDataset(dest_path='./data')

__getitem__(idx)

reward(idx, action)

ImdbMovieReviews(dest_path='./data', partition='train', max_len=255, tokenizer=None)

__getitem__(idx)

reward(idx, action)

MNISTDataset(dest_path='./data')

__getitem__(idx)

reward(idx, action)

MovieLensDataset(dest_path='./data', svd_rank=20, outer_product=True, k=4, L=200, min_movies=10, version='ml-latest-small', store_features=True)

__getitem__(idx)

reward(idx, action)

StatlogDataset()

__getitem__(idx)

reward(idx, action)

TinyImageNetDataset(dest_path='./data', split='train', max_classes=10)

__getitem__(idx)

reward(idx, action)

WheelBanditDataset(num_samples, delta, mu_small=1.0, std_small=0.01, mu_medium=1.2, std_medium=0.01, mu_large=50.0, std_large=0.01, seed=None)

__getitem__(idx)

reward(idx, action)

Environment

BanditBenchmarkEnvironment(dataloader, device=None)

__iter__()

__next__()

get_feedback(chosen_actions)

compute_regret(chosen_actions)

`AbstractDataset(needs_disjoint_contextualization=False)`

`getitem(idx)` `abstractmethod`

`len()` `abstractmethod`

`reward(idx, action)` `abstractmethod`

`CovertypeDataset(dest_path='./data')`

`getitem(idx)`

`reward(idx, action)`

`ImdbMovieReviews(dest_path='./data', partition='train', max_len=255, tokenizer=None)`

`getitem(idx)`

`reward(idx, action)`

`MNISTDataset(dest_path='./data')`

`getitem(idx)`

`reward(idx, action)`

`MovieLensDataset(dest_path='./data', svd_rank=20, outer_product=True, k=4, L=200, min_movies=10, version='ml-latest-small', store_features=True)`

`getitem(idx)`

`reward(idx, action)`

`StatlogDataset()`

`getitem(idx)`

`reward(idx, action)`

`TinyImageNetDataset(dest_path='./data', split='train', max_classes=10)`

`getitem(idx)`

`reward(idx, action)`

`WheelBanditDataset(num_samples, delta, mu_small=1.0, std_small=0.01, mu_medium=1.2, std_medium=0.01, mu_large=50.0, std_large=0.01, seed=None)`

`getitem(idx)`

`reward(idx, action)`

`BanditBenchmarkEnvironment(dataloader, device=None)`

`iter()`

`next()`

`get_feedback(chosen_actions)`

`compute_regret(chosen_actions)`