Skip to content

Benchmark

The benchmark is a collection of scripts that can be used to evaluate the performance of a bandit algorithm.

Datasets

A dataset implements the AbstractDataset class. There are currently 6 datasets for the benchmark:

  • CovertypeDataset - classification of forest cover types

  • ImdbMovieReviews - sentiment classification of movie reviews

  • MNIST - classification of 28x28 images of digits

  • MovieLens - recommendation of movies

  • Statlog (Shuttle) - classification of different modes of the space shuttle

  • Tiny ImageNet - more difficult image classification task for large image networks.

  • Wheel - synthetic dataset described here


AbstractDataset(needs_disjoint_contextualization=False)

Bases: ABC, Generic[ActionInputType], Dataset[tuple[ActionInputType, Tensor]]

Abstract class for a dataset that is derived from PyTorch's Dataset class.

Additionally, it provides a reward method for the specific bandit setting.

Subclasses should have the following attributes:

  • num_actions: The maximum number of actions available to the agent.

  • context_size: The standard size of the context vector. If needs_disjoint_contextualization is True, the number of features should be multiplied by the number of actions.

ActionInputType Generic

The type of the contextualized actions that are input to the bandit.

Parameters:

Name Type Description Default
needs_disjoint_contextualization bool

Whether the dataset needs disjoint contextualization.

False
Source code in src/calvera/benchmark/datasets/abstract_dataset.py
def __init__(self, needs_disjoint_contextualization: bool = False) -> None:
    """Initialize the dataset.

    Args:
        needs_disjoint_contextualization: Whether the dataset needs disjoint contextualization.
    """
    self.contextualizer: MultiClassContextualizer | Callable[[Any], Any]
    if needs_disjoint_contextualization:
        self.contextualizer = MultiClassContextualizer(self.num_actions)
    else:
        self.contextualizer = lambda x: x

__getitem__(idx) abstractmethod

Retrieve the item and the associated rewards for a given index.

Returns:

Type Description
tuple[ActionInputType, Tensor]

A tuple containing the item and the rewards of the different actions.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py
@abstractmethod
def __getitem__(self, idx: int) -> tuple[ActionInputType, torch.Tensor]:
    """Retrieve the item and the associated rewards for a given index.

    Returns:
        A tuple containing the item and the rewards of the different actions.
    """
    pass

__len__() abstractmethod

Return the number of contexts/samples in this dataset.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py
@abstractmethod
def __len__(self) -> int:
    """Return the number of contexts/samples in this dataset."""
    pass

reward(idx, action) abstractmethod

Returns the reward for a given index and action.

Source code in src/calvera/benchmark/datasets/abstract_dataset.py
@abstractmethod
def reward(self, idx: int, action: int) -> float:
    """Returns the reward for a given index and action."""
    pass

CovertypeDataset(dest_path='./data')

Bases: AbstractDataset[Tensor]

Loads the Covertype dataset as a PyTorch Dataset from the UCI repository.

More information can be found at https://archive.ics.uci.edu/ml/datasets/covertype.

Parameters:

Name Type Description Default
dest_path str

Where to store and look for the dataset.

'./data'
Source code in src/calvera/benchmark/datasets/covertype.py
def __init__(self, dest_path: str = "./data") -> None:
    """Initialize the Covertype dataset. Downloads the dataset from UCI repository if not found at `dest_path`.

    Args:
        dest_path: Where to store and look for the dataset.
    """
    super().__init__(needs_disjoint_contextualization=True)
    self.data = fetch_covtype(data_home=dest_path)
    X_np = self.data.data.astype(np.float32)
    y_np = self.data.target.astype(np.int64)

    self.X = torch.tensor(X_np, dtype=torch.float32)
    self.y = torch.tensor(y_np, dtype=torch.long)

__getitem__(idx)

Return the contextualized actions and rewards for a given index.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required

Returns:

Name Type Description
contextualized_actions Tensor

The contextualized actions for the given index.

rewards Tensor

The rewards for each action. Retrieved via self.reward.

Source code in src/calvera/benchmark/datasets/covertype.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    context = self.X[idx].reshape(1, -1)
    contextualized_actions = self.contextualizer(context).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

reward(idx, action)

Return the reward for a given index and action.

1.0 if the action is the correct cover type, 0.0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
action int

The action to evaluate.

required

Returns:

Type Description
float

1.0 if the action is the correct cover type, 0.0 otherwise.

Source code in src/calvera/benchmark/datasets/covertype.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the correct cover type, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action to evaluate.

    Returns:
        1.0 if the action is the correct cover type, 0.0 otherwise.
    """
    return float(self.y[idx] == action + 1)

ImdbMovieReviews(dest_path='./data', partition='train', max_len=255, tokenizer=None)

Bases: AbstractDataset[TextActionInputType]

A dataset for the IMDB movie reviews sentiment classification task.

More information can be found at https://ai.stanford.edu/~amaas/data/sentiment/.

Parameters:

Name Type Description Default
dest_path str

The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory.

'./data'
partition Literal['train', 'test']

The partition of the dataset to use. Either "train" or "test".

'train'
max_len int

The maximum length of the input text. If the text is longer than this, it will be truncated. If it is shorter, it will be padded. This is also the context_size of the dataset.

255
tokenizer PreTrainedTokenizer | None

A tokenizer from the transformers library. If None, the BertTokenizer will be used.

None

Parameters:

Name Type Description Default
dest_path str

The path to the directory where the dataset is stored. If None, the dataset will be downloaded to the current directory.

'./data'
partition Literal['train', 'test']

The partition of the dataset to use. Either "train" or "test".

'train'
max_len int

The maximum length of the input text. If the text is longer than this, it will be truncated.

255
tokenizer PreTrainedTokenizer | None

A tokenizer from the transformers library. If None, the BertTokenizer will be used.

None
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
def __init__(
    self,
    dest_path: str = "./data",
    partition: Literal["train", "test"] = "train",
    max_len: int = 255,
    tokenizer: PreTrainedTokenizer | None = None,
):
    """Initialize the IMDB movie reviews dataset.

    Args:
        dest_path: The path to the directory where the dataset is stored. If None, the dataset will be downloaded
            to the current directory.
        partition: The partition of the dataset to use. Either "train" or "test".
        max_len: The maximum length of the input text. If the text is longer than this, it will be truncated.
        tokenizer: A tokenizer from the `transformers` library. If None, the `BertTokenizer` will be used.
    """
    # Using disjoint contextualization for this dataset does not work. We have a sequence of tokens.
    super().__init__(needs_disjoint_contextualization=False)

    self.data = _setup_dataset(
        partition=partition,
        dest_path=dest_path,
    )

    if tokenizer is None:
        self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", padding="max_length", truncation=True)

__getitem__(idx)

Return the input and reward for the given index.

Parameters:

Name Type Description Default
idx int

The index of the sample to retrieve.

required

Returns:

Type Description
TextActionInputType

A tuple containing the necessary input for a model from the transformers library and the reward.

Tensor

Specifically, the input is a tuple containing the input_ids, attention_mask, and token_type_ids.

tuple[TextActionInputType, Tensor]
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
def __getitem__(self, idx: int) -> tuple[TextActionInputType, torch.Tensor]:
    """Return the input and reward for the given index.

    Args:
        idx: The index of the sample to retrieve.

    Returns:
        A tuple containing the necessary input for a model from the `transformers` library and the reward.
        Specifically, the input is a tuple containing the `input_ids`, `attention_mask`, and `token_type_ids`.
        (cmp. [https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__](https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__))
    """
    inputs = self.tokenizer(
        self.data["text"][idx],
        None,
        add_special_tokens=True,
        max_length=self.context_size,
        padding="max_length",
        truncation=True,
        return_token_type_ids=True,
    )

    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float,
    )

    return (
        (
            torch.tensor(inputs["input_ids"], dtype=torch.long).unsqueeze(0),
            torch.tensor(inputs["attention_mask"], dtype=torch.long).unsqueeze(0),
            torch.tensor(inputs["token_type_ids"], dtype=torch.long).unsqueeze(0),
        ),
        rewards,
    )

reward(idx, action)

Return the reward for the given index and action.

1.0 if the action is the correct sentiment, 0.0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the sample.

required
action int

The action to evaluate.

required
Source code in src/calvera/benchmark/datasets/imdb_reviews.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for the given index and action.

    1.0 if the action is the correct sentiment, 0.0 otherwise.

    Args:
        idx: The index of the sample.
        action: The action to evaluate.
    """
    return 1.0 if action == self.data["sentiment"][idx] else 0.0

MNISTDataset(dest_path='./data')

Bases: AbstractDataset[Tensor]

Loads the MNIST 784 (version=1) dataset as a PyTorch Dataset.

More information can be found at https://www.openml.org/search?type=data&status=active&id=554.

Loads the dataset from OpenML and stores it as PyTorch tensors.

Parameters:

Name Type Description Default
dest_path str

Where to store the dataset

'./data'
Source code in src/calvera/benchmark/datasets/mnist.py
def __init__(self, dest_path: str = "./data") -> None:
    """Initialize the MNIST 784 dataset.

    Loads the dataset from OpenML and stores it as PyTorch tensors.

    Args:
        dest_path: Where to store the dataset
    """
    super().__init__(needs_disjoint_contextualization=True)
    self.data: Bunch = fetch_openml(
        name="mnist_784",
        version=1,
        data_home=dest_path,
        as_frame=False,
    )
    self.X = self.data.data.astype(np.float32)
    self.y = self.data.target.astype(np.int64)

__getitem__(idx)

Return the contextualized actions and rewards for a given index.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required

Returns:

Name Type Description
contextualized_actions Tensor

The contextualized actions for the given index.

rewards Tensor

The rewards for each action. Retrieved via self.reward.

Source code in src/calvera/benchmark/datasets/mnist.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    X_item = torch.tensor(self.X[idx], dtype=torch.float32).unsqueeze(0)
    contextualized_actions = self.contextualizer(X_item).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

reward(idx, action)

Return the reward for a given index and action.

1.0 if the action is the same as the label, 0.0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
action int

The action for which the reward is requested.

required
Source code in src/calvera/benchmark/datasets/mnist.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the same as the label, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action)

MovieLensDataset(dest_path='./data', svd_rank=20, outer_product=True, k=4, L=200, min_movies=10, version='ml-latest-small', store_features=True)

Bases: AbstractDataset[Tensor]

MovieLens dataset for combinatorial contextual bandits.

The dataset is provided by the GroupLens Research specifically by Harper and Konstan (2015, The MovieLens Datasets: History and Context). It contains ratings of movies by different users. We do not use the ratings directly here but only the information that a user has rated and therefore watched this movie. More information can be found at https://www.grouplens.org/datasets/movielens/. We build the context by using the SVD decomposition of the user-movie matrix. The context is the outer product of the user and movie features.

References

Parameters:

Name Type Description Default
dest_path str

The directory where the dataset is / will be stored.

'./data'
svd_rank int

Rank (number of latent dimensions) for the SVD decomposition.

20
outer_product bool

Whether to use the outer product of the user and movie features as the context. If False, the context will be the concatenation of the user and movie features. (Might perform better for Neural Bandits).

True
k int

The number of movies to exclude per user.

4
L int

The number of movies to include in the dataset. (Top L most common movies).

200
min_movies int

The minimum number of movies a user must have rated to be included in the dataset (after only taking the top L movies).

10
version Literal['ml-latest-small', 'ml-32m']

The version of the MovieLens dataset to use. Either "ml-latest-small" or "ml-32m".

'ml-latest-small'
store_features bool

Whether to store the user and movie features. If True, the features will be stored in dest_path.

True
Source code in src/calvera/benchmark/datasets/movie_lens.py
def __init__(
    self,
    dest_path: str = "./data",
    svd_rank: int = 20,
    outer_product: bool = True,
    k: int = 4,
    L: int = 200,
    min_movies: int = 10,
    version: Literal["ml-latest-small", "ml-32m"] = "ml-latest-small",
    store_features: bool = True,
):
    """Initialize the MovieLens dataset.

    Args:
        dest_path: The directory where the dataset is / will be stored.
        svd_rank: Rank (number of latent dimensions) for the SVD decomposition.
        outer_product: Whether to use the outer product of the user and movie features as the context. If `False`,
            the context will be the concatenation of the user and movie features.
            (Might perform better for Neural Bandits).
        k: The number of movies to exclude per user.
        L: The number of movies to include in the dataset. (Top `L` most common movies).
        min_movies: The minimum number of movies a user must have rated to be included in the dataset (after only
            taking the top `L` movies).
        version: The version of the MovieLens dataset to use. Either "ml-latest-small" or "ml-32m".
        store_features: Whether to store the user and movie features.
            If `True`, the features will be stored in `dest_path`.
    """
    super().__init__(needs_disjoint_contextualization=False)
    self.user_features, self.movie_features, self.history, self.F = _setup_movielens(
        dest_path=dest_path,
        svd_rank=svd_rank,
        k=k,
        L=L,
        min_movies=min_movies,
        version=version,
        store_features=store_features,
    )
    self.outer_product = outer_product

    # We can predict k movies per user. The idea is that we only predict a user once.
    self.num_actions = self.history.shape[-1]
    self.num_samples = self.user_features.shape[0]
    self.context_size = (
        self.user_features.shape[-1] * self.movie_features.shape[-1]
        if self.outer_product
        else self.user_features.shape[-1] + self.movie_features.shape[-1]
    )

__getitem__(idx)

Return the contextualized actions and rewards for a given index.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required

Returns:

Name Type Description
contextualized_actions Tensor

The contextualized actions for the given index.

rewards Tensor

The rewards for each action. Retrieved via self.reward.

Source code in src/calvera/benchmark/datasets/movie_lens.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    # Get avaiable actions (1 - history[userId - 1 = idx])
    available_actions = (1.0 - self.history[idx]).bool()

    # Get the context for each action
    contexts: torch.Tensor

    if self.outer_product:
        contexts = self.user_features[idx].unsqueeze(-1) * self.movie_features.unsqueeze(1)
        contexts = contexts.flatten(1)
    else:
        contexts = torch.cat(
            (
                self.user_features[idx].unsqueeze(0).expand(self.movie_features.size(0), -1),
                self.movie_features,
            ),
            dim=-1,
        )

    return contexts, torch.tensor(
        [
            self.reward(idx, movie_idx)
            for movie_idx in range(self.history.shape[-1])
            if available_actions[movie_idx]
        ],
        dtype=torch.float32,
    )

reward(idx, action)

Return the reward for a given index and action.

Returns 1 if the action is in the future, 0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
action int

The action for which the reward is requested.

required
Source code in src/calvera/benchmark/datasets/movie_lens.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    Returns 1 if the action is in the future, 0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    # An idx represents a user and the action is a movie.
    return self.F[idx, action].item()

StatlogDataset()

Bases: AbstractDataset[Tensor]

Loads the Statlog (Shuttle) dataset as a PyTorch Dataset from the UCI repository.

More information can be found at https://archive.ics.uci.edu/dataset/148/statlog+shuttle.

Loads the dataset from the UCI repository and stores it as PyTorch tensors.

Source code in src/calvera/benchmark/datasets/statlog.py
def __init__(self) -> None:
    """Initialize the Statlog (Shuttle) dataset.

    Loads the dataset from the UCI repository and stores it as PyTorch tensors.
    """
    super().__init__(needs_disjoint_contextualization=True)
    dataset = fetch_ucirepo(id=148)  # id=148 specifies the Statlog (Shuttle) dataset
    X = dataset.data.features
    y = dataset.data.targets

    self.X = torch.tensor(X.values, dtype=torch.float32)
    self.y = torch.tensor(y.values, dtype=torch.long)

__getitem__(idx)

Return the contextualized actions and rewards for a given index.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required

Returns:

Name Type Description
contextualized_actions Tensor

The contextualized actions for the given index.

rewards Tensor

The rewards for each action. Retrieved via self.reward.

Source code in src/calvera/benchmark/datasets/statlog.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        contextualized_actions: The contextualized actions for the given index.
        rewards: The rewards for each action. Retrieved via `self.reward`.
    """
    contextualized_actions = self.contextualizer(self.X[idx].unsqueeze(0)).squeeze(0)
    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return contextualized_actions, rewards

reward(idx, action)

Return the reward for a given index and action.

Returns 1 if the action is the same as the label, 0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
action int

The action for which the reward is requested.

required
Source code in src/calvera/benchmark/datasets/statlog.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    Returns 1 if the action is the same as the label, 0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action + 1)

TinyImageNetDataset(dest_path='./data', split='train', max_classes=10)

Bases: AbstractDataset[Tensor]

Loads the Tiny ImageNet dataset as a PyTorch Dataset.

More information can be found at https://cs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf.

Tiny ImageNet has 200 classes with 500 training images, 50 validation images, and 50 test images per class. Each image is 64x64 pixels in 3 channels (RGB).

Parameters:

Name Type Description Default
dest_path str

The directory where the dataset will be stored.

'./data'
split Literal['train', 'val', 'test']

Which split to use ('train', 'val', or 'test')

'train'
max_classes int

The maximum number of classes to use from the dataset. Default is 10.

10
Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
def __init__(
    self,
    dest_path: str = "./data",
    split: Literal["train", "val", "test"] = "train",
    max_classes: int = 10,
) -> None:
    """Initialize the Tiny ImageNet dataset.

    Args:
        dest_path: The directory where the dataset will be stored.
        split: Which split to use ('train', 'val', or 'test')
        max_classes: The maximum number of classes to use from the dataset. Default is 10.
    """
    super().__init__(needs_disjoint_contextualization=False)

    self.dest_path = dest_path
    self.split = split

    self.image_dataset, self.y = _setup_tinyimagenet(
        dest_path=dest_path,
        split=split,
    )

    if max_classes < 200:
        self.image_dataset.classes = self.image_dataset.classes[:max_classes]
        self.image_dataset.class_to_idx = {c: i for i, c in enumerate(self.image_dataset.classes)}
        # filter out samples from classes not in the first `max_classes`
        self.image_dataset.samples = [
            (path, label) for path, label in self.image_dataset.samples if label < max_classes
        ]
        self.y = self.y[self.y < max_classes]
        self.num_actions = max_classes

    self.idx_to_class = {v: k for k, v in self.image_dataset.class_to_idx.items()}

    self.X = None

__getitem__(idx)

Return the context (flattened image) and rewards for a given index.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required

Returns:

Name Type Description
context Tensor

The flattened image features as the context.

rewards Tensor

The rewards for each action (1.0 for correct class, 0.0 otherwise).

Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the context (flattened image) and rewards for a given index.

    Args:
        idx: The index of the context in this dataset.

    Returns:
        context: The flattened image features as the context.
        rewards: The rewards for each action (1.0 for correct class, 0.0 otherwise).
    """
    image_tensor, _ = self.image_dataset[idx]

    # Flatten the image tensor from (C, H, W) to (C*H*W)
    context = image_tensor.view(1, -1)  # shape: (1, 3*64*64)

    rewards = torch.tensor(
        [self.reward(idx, action) for action in range(self.num_actions)],
        dtype=torch.float32,
    )

    return context, rewards

reward(idx, action)

Return the reward for a given index and action.

1.0 if the action is the same as the label, 0.0 otherwise.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
action int

The action for which the reward is requested.

required
Source code in src/calvera/benchmark/datasets/tiny_imagenet.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward for a given index and action.

    1.0 if the action is the same as the label, 0.0 otherwise.

    Args:
        idx: The index of the context in this dataset.
        action: The action for which the reward is requested.
    """
    return float(self.y[idx] == action)

WheelBanditDataset(num_samples, delta, mu_small=1.0, std_small=0.01, mu_medium=1.2, std_medium=0.01, mu_large=50.0, std_large=0.01, seed=None)

Bases: AbstractDataset[Tensor]

Generates a dataset for the Wheel Bandit problem.

References

Parameters:

Name Type Description Default
num_samples int

Number of samples to generate.

required
delta float

Exploration parameter: high reward in one region if norm above delta

required
mu_small float

Mean of the small reward distribution.

1.0
std_small float

Standard deviation of the small reward distribution.

0.01
mu_medium float

Mean of the medium reward distribution.

1.2
std_medium float

Standard deviation of the medium reward distribution.

0.01
mu_large float

Mean of the large reward distribution.

50.0
std_large float

Standard deviation of the large reward distribution.

0.01
seed int | None

Seed for the random number generator.

None
Source code in src/calvera/benchmark/datasets/wheel.py
def __init__(
    self,
    num_samples: int,
    delta: float,
    mu_small: float = 1.0,
    std_small: float = 0.01,
    mu_medium: float = 1.2,
    std_medium: float = 0.01,
    mu_large: float = 50.0,
    std_large: float = 0.01,
    seed: int | None = None,
) -> None:
    """Initialize the Wheel Bandit dataset.

    Args:
        num_samples: Number of samples to generate.
        delta: Exploration parameter: high reward in one region if norm above delta
        mu_small: Mean of the small reward distribution.
        std_small: Standard deviation of the small reward distribution.
        mu_medium: Mean of the medium reward distribution.
        std_medium: Standard deviation of the medium reward distribution.
        mu_large: Mean of the large reward distribution.
        std_large: Standard deviation of the large reward distribution.
        seed: Seed for the random number generator.
    """
    super().__init__(needs_disjoint_contextualization=True)

    self.num_samples = num_samples
    self.delta = delta

    # Reward distributions
    self.mu_small = mu_small
    self.std_small = std_small
    self.mu_medium = mu_medium
    self.std_medium = std_medium
    self.mu_large = mu_large
    self.std_large = std_large

    data, rewards = self._generate_data(seed)
    self.data = data
    self.rewards = rewards

__getitem__(idx)

Return the contextualized actions and rewards for the context at index idx in this dataset.

Parameters:

Name Type Description Default
idx int

The index of the context in this dataset.

required
Source code in src/calvera/benchmark/datasets/wheel.py
def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
    """Return the contextualized actions and rewards for the context at index idx in this dataset.

    Args:
        idx: The index of the context in this dataset.
    """
    contextualized_actions = self.contextualizer(self.data[idx].unsqueeze(0)).squeeze(0)
    rewards = self.rewards[idx]

    return contextualized_actions, rewards

reward(idx, action)

Return the reward of the given action for the context at index idx in this dataset.

Source code in src/calvera/benchmark/datasets/wheel.py
def reward(self, idx: int, action: int) -> float:
    """Return the reward of the given action for the context at index idx in this dataset."""
    return self.rewards[idx, action].item()

Environment

BanditBenchmarkEnvironment(dataloader, device=None)

Bases: Generic[ActionInputType]

Environment that iterates over a DataLoader, yielding only contextualized_actions.

Internally stores rewards, which can be retrieved by a helper method. This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions and not the rewards.

Internally stores rewards, which can be retrieved by a helper method. This is used to simulate a bandit environment with delayed feedback where the bandit can only see the actions and not the rewards. The bandit should first sample contextualized_actions by iterating over the environment. The bandit can then choose the best actions. Finally, the bandit can receive rewards by calling get_rewards_dataset(chosen_actions). Since this is a simulation, the bandit can also compute the regret by calling compute_regret(chosen_actions).

Usage:

environment = BanditBenchmarkEnvironment(dataloader)
for contextualized_actions in environment:
    chosen_actions, p = bandit.forward(contextualized_actions)  # one-hot tensor
    chosen_contextualized_actions, realized_rewards = environment.get_feedback(chosen_actions)
    bandit.record_feedback(chosen_contextualized_actions, realized_rewards)

    # optional: compute regret
    regret = environment.compute_regret(chosen_actions)

Parameters:

Name Type Description Default
dataloader DataLoader[tuple[ActionInputType, Tensor]]

DataLoader that yields batches of (contextualized_actions, all_rewards) tuples.

required
device device | None

The device the tensors should be moved to. If None, the default device is used.

None
Source code in src/calvera/benchmark/environment.py
def __init__(
    self,
    dataloader: DataLoader[tuple[ActionInputType, torch.Tensor]],
    device: torch.device | None = None,
) -> None:
    """Initializes a BanditBenchmarkEnvironment.

    Args:
        dataloader: DataLoader that yields batches of (contextualized_actions, all_rewards) tuples.
        device: The device the tensors should be moved to. If None, the default device is used.
    """
    self._dataloader: DataLoader[tuple[ActionInputType, torch.Tensor]] = dataloader
    self._iterator: _BaseDataLoaderIter | None = None
    self._last_contextualized_actions: ActionInputType | None = None
    self._last_all_rewards: torch.Tensor | None = None
    self.device = device

__iter__()

Returns an iterator object for the BanditBenchmarkEnvironment.

This method initializes an iterator for the dataloader and returns the BanditBenchmarkEnvironment instance itself, allowing it to be used as an iterator in a loop. Needs to be called before the first iteration.

Returns:

Name Type Description
BanditBenchmarkEnvironment BanditBenchmarkEnvironment[ActionInputType]

The instance of the environment itself.

Source code in src/calvera/benchmark/environment.py
def __iter__(self) -> "BanditBenchmarkEnvironment[ActionInputType]":
    """Returns an iterator object for the BanditBenchmarkEnvironment.

    This method initializes an iterator for the dataloader and returns the
    BanditBenchmarkEnvironment instance itself, allowing it to be used as an
    iterator in a loop. Needs to be called before the first iteration.

    Returns:
        BanditBenchmarkEnvironment: The instance of the environment itself.
    """
    self._iterator = iter(self._dataloader)
    return self

__next__()

Returns the next batch of contextualized actions from the DataLoader.

Returns:

Type Description
ActionInputType

The contextualized actions for the bandit to pick from.

Raises:

Type Description
AssertionError

If the iterator is not initialized with __iter__.

Source code in src/calvera/benchmark/environment.py
def __next__(self) -> ActionInputType:
    """Returns the next batch of contextualized actions from the DataLoader.

    Returns:
        The contextualized actions for the bandit to pick from.

    Raises:
        AssertionError: If the iterator is not initialized with `__iter__`.
    """
    assert self._iterator is not None, "No iterator was created."

    # Retrieve one batch from the DataLoader
    batch = next(self._iterator)
    contextualized_actions: ActionInputType = batch[0]
    all_rewards: torch.Tensor = batch[1].to(device=self.device)

    if isinstance(contextualized_actions, torch.Tensor):
        batch_size, num_actions = contextualized_actions.shape[:2]
        contextualized_actions = cast(ActionInputType, contextualized_actions.to(device=self.device))
    elif isinstance(contextualized_actions, tuple | list):
        contextualized_actions = cast(
            ActionInputType,
            tuple(action_tensor.to(device=self.device) for action_tensor in contextualized_actions),
        )
        batch_size, num_actions = contextualized_actions[0].shape[:2]
    else:
        raise ValueError(
            f"contextualized_actions must be a torch.Tensor or a tuple. Received {type(contextualized_actions)}."
        )

    assert batch_size == all_rewards.size(0), (
        f"Mismatched batch size of contextualized_actions and all_rewards tensors."
        f"Received {batch_size} and {all_rewards.size(0)}."
    )
    assert num_actions == all_rewards.size(1) or num_actions == 1, (
        f"Mismatched number of actions in contextualized_actions and all_rewards tensors."
        f"Received {num_actions} and {all_rewards.size(1)}."
    )

    # Store them so we can fetch them later when building the update dataset
    self._last_contextualized_actions = contextualized_actions
    self._last_all_rewards = all_rewards
    # Return only the contextualized actions for the bandit to pick from
    return contextualized_actions

get_feedback(chosen_actions)

Returns the chosen actions & realized rewards of the last batch.

For combinatorial bandits, this feedback is semi-bandit feedback.

Parameters:

Name Type Description Default
chosen_actions Tensor

shape (n, m) (one-hot, possibly multiple "1"s). The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows.

required

Returns:

Type Description
tuple[ActionInputType, Tensor]

BanditFeedbackDataset with the chosen actions (shape: (n, m, k)) and realized rewards (shape: (n, m)).

Source code in src/calvera/benchmark/environment.py
def get_feedback(self, chosen_actions: torch.Tensor) -> tuple[ActionInputType, torch.Tensor]:
    """Returns the chosen actions & realized rewards of the last batch.

    For combinatorial bandits, this feedback is semi-bandit feedback.

    Args:
        chosen_actions: shape (n, m) (one-hot, possibly multiple "1"s). The actions chosen by the bandit. Must
            contain at least one and the same number of chosen actions ("1s") for all rows.

    Returns:
        BanditFeedbackDataset with the chosen actions (shape: (n, m, k)) and realized rewards (shape: (n, m)).
    """
    self._validate_chosen_actions(chosen_actions)

    chosen_contextualized_actions = self._get_chosen_contextualized_actions(chosen_actions)
    realized_rewards = self._get_realized_rewards(chosen_actions)

    return (
        chosen_contextualized_actions,
        realized_rewards,
    )

compute_regret(chosen_actions)

Computes the regret for the most recent batch.

Definition

best_reward = max over top i actions (where i is the number of chosen actions) chosen_reward = sum over chosen actions (handles multiple 1s per row) regret = best_reward - chosen_reward

Important: For combinatorial bandits assumes that the reward of a super-action is the sum of each chosen arm.

Parameters:

Name Type Description Default
chosen_actions Tensor

shape (n, k), one-hot, possibly multiple "1"s. The actions chosen by the bandit. Must contain at least one and the same number of chosen actions ("1s") for all rows.

required

Returns:

Type Description
Tensor

Tensor of regrets shape (n, ).

Source code in src/calvera/benchmark/environment.py
def compute_regret(self, chosen_actions: torch.Tensor) -> torch.Tensor:
    """Computes the regret for the most recent batch.

    Definition:
      best_reward = max over top i actions (where i is the number of chosen actions)
      chosen_reward = sum over chosen actions (handles multiple 1s per row)
      regret = best_reward - chosen_reward
    Important: For combinatorial bandits assumes that the reward of a super-action is the sum of each chosen arm.

    Args:
        chosen_actions: shape (n, k), one-hot, possibly multiple "1"s. The actions chosen by the bandit. Must
            contain at least one and the same number of chosen actions ("1s") for all rows.

    Returns:
        Tensor of regrets shape (n, ).
    """
    self._validate_chosen_actions(chosen_actions)

    best_action_rewards = self._get_best_action_rewards(chosen_actions).sum(dim=1)
    chosen_reward = self._get_realized_rewards(chosen_actions).sum(dim=1)

    eps = 1e-4
    assert torch.all(best_action_rewards >= chosen_reward - eps), (
        "Best action rewards should be greater than chosen rewards. "
        f"Best: {best_action_rewards}, Chosen: {chosen_reward}"
    )

    regret = best_action_rewards - chosen_reward
    # clamp to 0 to avoid negative regrets
    regret = torch.clamp(regret, min=0.0)
    return regret