Calvera Usage Examples
This guide demonstrates how to use the Calvera library for multi-armed bandit algorithms. We'll cover how to use various bandit algorithms, from simple linear bandits to more complex neural bandits, and show how to integrate them into your workflows.
Quick Start
Let's start with a simple example using Linear Thompson Sampling:
import torch
from calvera.bandits import LinearTSBandit
# 1. Create a bandit for a linear model with 128 features
N_FEATURES = 128
bandit = LinearTSBandit(n_features=N_FEATURES)
# 2. Generate sample data (batch_size, n_actions, n_features) and perform inference
data = torch.randn(100, 1, N_FEATURES)
chosen_arms_one_hot, probabilities = bandit(data)
chosen_arms = chosen_arms_one_hot.argmax(dim=1)
# 3. Retrieve rewards for the chosen arms
rewards = torch.randn(100, 1)
# 4. Add the data to the bandit
chosen_contextualized_actions = data[:, :, chosen_arms]
bandit.record_feedback(chosen_contextualized_actions, rewards)
# 5. Train the bandit
trainer = pl.Trainer(
max_epochs=1,
enable_progress_bar=False,
enable_model_summary=False,
accelerator=accelerator,
)
trainer.fit(bandit)
# (6. Repeat the process as needed)
This simple example demonstrates the core workflow with Calvera bandits:
- Create a bandit model
- Perform inference on your data to get action selections
- Collect rewards for chosen actions
- Record the data in the bandit's buffer
- Train the bandit using PyTorch Lightning
Now, let's explore the different types of bandits in more detail.
Linear Bandits
Linear bandits model the expected reward as a linear function of the context. Calvera provides two main variants: Linear Thompson Sampling and Linear UCB.
Linear Thompson Sampling
Linear Thompson Sampling (LinTS) maintains a Bayesian posterior over the model parameters and, at each time step, plays an arm according to its posterior probability of being the best arm.
Let's look at a practical example using the StatLog dataset:
import torch
import lightning as pl
from torch.utils.data import DataLoader, Subset
from calvera.bandits import LinearTSBandit
from calvera.benchmark.datasets import StatlogDataset
from calvera.benchmark import BanditBenchmarkEnvironment
from calvera.utils import ArgMaxSelector
# Load the StatLog dataset
dataset = StatlogDataset()
print(f"Dataset context size: {dataset.context_size}")
print(f"Dataset sample count: {len(dataset)}")
# Create data loader for a subset of the data
train_loader = DataLoader(Subset(dataset, range(10000)), batch_size=32, shuffle=True)
# Set up the environment
accelerator = "cpu"
env = BanditBenchmarkEnvironment(train_loader, device=accelerator)
# Initialize the Linear TS bandit
bandit_module = LinearTSBandit(
n_features=dataset.context_size,
selector=ArgMaxSelector(),
lazy_uncertainty_update=True,
).to(accelerator)
Now we can run the training loop:
import numpy as np
import pandas as pd
from tqdm import tqdm
rewards = np.array([])
regrets = np.array([])
progress = tqdm(iter(env), total=len(env))
for contextualized_actions in progress:
# 1. Select actions
chosen_actions, _ = bandit_module.forward(contextualized_actions)
# 2. Create a trainer for this step
trainer = pl.Trainer(
max_epochs=1,
enable_progress_bar=False,
enable_model_summary=False,
accelerator=accelerator,
)
# 3. Get feedback from environment
chosen_contextualized_actions, realized_rewards = env.get_feedback(chosen_actions)
batch_regret = env.compute_regret(chosen_actions)
# 4. Track metrics
rewards = np.append(rewards, realized_rewards.cpu().numpy())
regrets = np.append(regrets, batch_regret.cpu().numpy())
progress.set_postfix({
"reward": realized_rewards.mean().item(),
"regret": batch_regret.mean().item(),
"avg_regret": regrets.mean()
})
# 5. Update the bandit
bandit_module.record_feedback(chosen_contextualized_actions, realized_rewards)
trainer.fit(bandit_module)
bandit_module = bandit_module.to(accelerator)
# Store metrics
metrics = pd.DataFrame({
"reward": rewards,
"regret": regrets,
})
We can then analyze and visualize our results:
import matplotlib.pyplot as plt
# Calculate cumulative metrics
cumulative_reward = np.cumsum(metrics["reward"])
cumulative_regret = np.cumsum(metrics["regret"])
# Plot results
plt.figure(figsize=(10, 5))
plt.plot(cumulative_reward, label="reward")
plt.plot(cumulative_regret, label="regret")
plt.xlabel("steps")
plt.ylabel("cumulative reward/regret")
plt.legend()
plt.show()
# Print average metrics at different time horizons
print(f"Average reward (first 10 rounds): {np.mean(metrics['reward'][:10]):.4f}")
print(f"Average reward (first 100 rounds): {np.mean(metrics['reward'][:100]):.4f}")
print(f"Average reward (all rounds): {np.mean(metrics['reward']):.4f}")
print("")
print(f"Average regret (first 10 rounds): {np.mean(metrics['regret'][:10]):.4f}")
print(f"Average regret (first 100 rounds): {np.mean(metrics['regret'][:100]):.4f}")
print(f"Average regret (all rounds): {np.mean(metrics['regret']):.4f}")
Linear UCB
The Linear UCB (LinUCB) algorithm uses an upper confidence bound strategy to balance exploration and exploitation.
The interface is similar to Linear TS, with the main difference being the algorithm used:
from calvera.bandits import LinearUCBBandit
# Initialize the Linear UCB bandit
bandit_module = LinearUCBBandit(
n_features=dataset.context_size,
selector=ArgMaxSelector(),
exploration_rate=1.0, # Controls the amount of exploration
lazy_uncertainty_update=True,
).to(accelerator)
# The rest of the code remains the same as in the Linear TS example
View complete notebook on GitHub
Neural Linear
Neural Linear performs a Bayesian linear regression on top of the representation of the last layer of a neural network.
Let's see an example:
import torch.nn as nn
from calvera.bandits import NeuralLinearBandit
from calvera.utils import TensorDataBuffer, AllDataRetrievalStrategy
# Define a neural network architecture
class Network(nn.Module):
def __init__(self, dim, hidden_size=100, n_embedding_size=10):
super().__init__()
self.fc1 = nn.Linear(dim, hidden_size)
self.activate = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, n_embedding_size)
def forward(self, x):
return self.fc2(self.activate(self.fc1(x)))
# Load dataset
dataset = StatlogDataset()
network = Network(dataset.context_size, hidden_size=100, n_embedding_size=10)
# Set up buffer for storing interaction data
buffer = TensorDataBuffer(
retrieval_strategy=AllDataRetrievalStrategy(),
max_size=10000,
)
# Create data loader
train_loader = DataLoader(Subset(dataset, range(10000)), batch_size=256, shuffle=True)
env = BanditBenchmarkEnvironment(train_loader)
# Initialize the Neural Linear bandit
bandit_module = NeuralLinearBandit(
n_embedding_size=10,
network=network,
buffer=buffer,
train_batch_size=32,
early_stop_threshold=1e-3,
weight_decay=1e-3,
learning_rate=1e-3,
min_samples_required_for_training=1024,
initial_train_steps=2048,
)
Now we can run the training loop, which is similar to the Linear TS example, but with a few differences:
from lightning.pytorch.loggers.csv_logs import CSVLogger
from calvera.benchmark import OnlineBanditLoggerDecorator
# Set up logger
logger = OnlineBanditLoggerDecorator(
CSVLogger("logs", name="neural_linear_bandit", flush_logs_every_n_steps=100),
enable_console_logging=False,
)
rewards = np.array([])
regrets = np.array([])
progress_bar = tqdm(enumerate(env), total=len(env))
for contextualized_actions in progress_bar:
# 1. Select actions
chosen_actions, _ = bandit_module.forward(contextualized_actions)
# 2. Create a trainer for this step
trainer = pl.Trainer(
max_epochs=1,
max_steps=1000,
logger=logger,
log_every_n_steps=1,
enable_progress_bar=False,
enable_model_summary=False,
enable_checkpointing=False,
)
# 3. Get feedback from environment
chosen_contextualized_actions, realized_rewards = env.get_feedback(chosen_actions)
batch_regret = env.compute_regret(chosen_actions)
# 4. Track metrics
rewards = np.append(rewards, realized_rewards.cpu().numpy())
regrets = np.append(regrets, batch_regret.cpu().numpy())
progress_bar.set_postfix(
reward=realized_rewards.mean().item(),
regret=batch_regret.mean().item(),
average_regret=regrets.mean(),
)
# 5. Update the bandit
bandit_module.record_feedback(chosen_contextualized_actions, realized_rewards)
trainer.fit(bandit_module)
The key differences in Neural Linear compared to Linear bandits:
- We need to define a neural network architecture
- We use a buffer to store interaction data
- The training process involves both updating the neural network and the linear head
In Neural Linear, the neural network and the Bayesian linear regression components are updated at different time scales
View complete notebook on GitHub.
Neural Bandits
Neural bandits use neural networks to model the expected reward function, which allows them to capture non-linear relationships between contexts and rewards.
Neural Thompson Sampling
Neural Thompson Sampling (NeuralTS) uses deep neural networks for both exploration and exploitation by sampling rewards from a posterior distribution based on the network's uncertainty.
Here's how to use it:
from calvera.bandits import NeuralTSBandit
# Define a simple network architecture
class Network(nn.Module):
def __init__(self, dim, hidden_size=100):
super().__init__()
self.fc1 = nn.Linear(dim, hidden_size)
self.activate = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, 1)
def forward(self, x):
return self.fc2(self.activate(self.fc1(x)))
# Load dataset and create network
dataset = StatlogDataset()
network = Network(dataset.context_size, hidden_size=100)
# Set up buffer
buffer = TensorDataBuffer(
retrieval_strategy=AllDataRetrievalStrategy(),
max_size=10000,
)
# Create data loader
train_loader = DataLoader(Subset(dataset, range(10000)), batch_size=256, shuffle=True)
env = BanditBenchmarkEnvironment(train_loader)
# Initialize the Neural TS bandit
bandit_module = NeuralTSBandit(
n_features=dataset.context_size,
network=network,
buffer=buffer,
train_batch_size=32,
early_stop_threshold=1e-3,
weight_decay=1e-3,
exploration_rate=1e-5,
learning_rate=1e-3,
min_samples_required_for_training=1024,
initial_train_steps=2048,
)
The training loop is very similar to the Neural Linear case:
# Set up logger
logger = OnlineBanditLoggerDecorator(
CSVLogger("logs", name="neural_ts_bandit", flush_logs_every_n_steps=100),
enable_console_logging=False
)
# Then run the same training loop as with Neural Linear
View complete notebook on GitHub
Neural UCB
Neural UCB (NeuralUCB) uses neural networks to estimate rewards and guides exploration through upper confidence bounds based on the network's uncertainty.
Here's how to use it:
from calvera.bandits import NeuralUCBBandit
# Network architecture is the same as for Neural TS
# Initialize the Neural UCB bandit
bandit_module = NeuralUCBBandit(
n_features=dataset.context_size,
network=network,
buffer=buffer,
train_batch_size=32,
early_stop_threshold=1e-3,
weight_decay=1e-3,
exploration_rate=1e-5, # Controls the amount of exploration
learning_rate=1e-3,
min_samples_required_for_training=128,
initial_train_steps=2048,
)
# The training loop is the same as for Neural TS
The main difference between NeuralUCB and NeuralTS is how they calculate action selection scores:
- NeuralUCB adds an uncertainty bonus to the predicted reward
- NeuralTS samples from an approximated posterior over the neural network outputs
View complete notebook on GitHub
Combinatorial Bandits
Combinatorial bandits extend the standard multi-armed bandit framework by allowing the selection of multiple arms simultaneously. This is particularly useful in settings where actions are combinations (e.g., recommending a slate of items) and the reward depends on the joint selection.
Combinatorial Neural Thompson Sampling
Combinatorial Neural Thompson Sampling adapts the neural network-based Thompson Sampling algorithm to the combinatorial setting.
Let's explore an example using a synthetic dataset and Neural Thompson Sampling:
from calvera.benchmark.datasets import SyntheticCombinatorialDataset
from calvera.bandits import NeuralTSBandit
from calvera.utils import TopKSelector
# Define a Neural Network for the Combinatorial Bandit
class Network(nn.Module):
def __init__(self, dim, hidden_size=100):
super().__init__()
self.fc1 = nn.Linear(dim, hidden_size)
self.activate = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, 1)
def forward(self, x):
return self.fc2(self.activate(self.fc1(x)))
# Create a Synthetic Combinatorial Dataset
dataset = SyntheticCombinatorialDataset(
n_samples=2000,
num_actions=20,
context_size=40,
function_type="quadratic",
noise_std=0,
seed=42
)
Now, let's set up the Combinatorial Neural Thompson Sampling bandit:
# Set up the environment
train_loader = DataLoader(Subset(dataset, range(2000)), batch_size=100, shuffle=True)
env = BanditBenchmarkEnvironment(train_loader)
# Create a data buffer
buffer = TensorDataBuffer(
retrieval_strategy=AllDataRetrievalStrategy(),
max_size=None,
)
# Initialize the Neural TS bandit with TopK selector
bandit_module = NeuralTSBandit(
n_features=dataset.context_size,
network=Network(dataset.context_size, hidden_size=100),
selector=TopKSelector(k=K),
buffer=buffer,
train_batch_size=32,
early_stop_threshold=1e-4,
weight_decay=1e-4,
exploration_rate=1,
learning_rate=1e-2,
min_samples_required_for_training=16,
initial_train_steps=1024,
num_samples_per_arm=10,
)
# Training Loop
rewards = np.array([])
regrets = np.array([])
progress_bar = tqdm(enumerate(env), total=len(env))
for i, contextualized_actions in progress_bar:
# Select actions
chosen_actions, _ = bandit_module.forward(contextualized_actions)
# Create a trainer
trainer = pl.Trainer(
max_epochs=1,
max_steps=1000,
gradient_clip_val=20.0,
log_every_n_steps=1,
enable_progress_bar=False,
enable_model_summary=False,
enable_checkpointing=False,
)
# Get feedback from environment
chosen_contextualized_actions, realized_scores = env.get_feedback(chosen_actions)
realized_rewards = realized_scores.sum(dim=1)
batch_regret = env.compute_regret(chosen_actions)
# Track metrics
rewards = np.append(rewards, realized_rewards.cpu().numpy())
regrets = np.append(regrets, batch_regret.cpu().numpy())
progress_bar.set_postfix(
reward=realized_rewards.mean().item(),
regret=batch_regret.mean().item(),
avg_regret=np.mean(regrets),
acc_regret=np.sum(regrets),
)
# Update the bandit
bandit_module.record_feedback(chosen_contextualized_actions, realized_scores)
trainer.fit(bandit_module)
Note that we are defining a NeuralTSBandit module here, and the key difference is the use of a TopKSelector, which enables the selection of multiple actions simultaneously. This selector can be customized based on your specific requirements to match your particular use case. (See Customization for more details on adapting selectors.)
View complete notebook on GitHub
Combinatorial Neural UCB
Similar to Combinatorial Neural Thompson Sampling, the Combinatorial Neural UCB (NeuralUCB) approach can be applied to combinatorial action spaces:
from calvera.bandits import NeuralUCBBandit
# Initialize the Neural UCB bandit
bandit_module = NeuralUCBBandit(
n_features=dataset.context_size,
network=Network(dataset.context_size, hidden_size=100),
selector=TopKSelector(k=K),
buffer=buffer,
train_batch_size=32,
early_stop_threshold=1e-4,
weight_decay=1e-4,
exploration_rate=1,
learning_rate=1e-2,
min_samples_required_for_training=16,
initial_train_steps=1024,
)
# The training loop remains the same as the Neural Thompson Sampling example
View complete notebook on GitHub
Customization
Calvera is designed to be highly customizable, allowing you to adapt it to your specific needs.
Selectors
Selectors determine how actions are chosen based on the scores produced by the bandit. Calvera provides three built-in selectors:
- ArgMaxSelector: Chooses the arm with the highest score
- EpsilonGreedySelector: Uses epsilon-greedy exploration
- TopKSelector: Selects the top k arms (for combinatorial bandits)
Here's how to use them:
from calvera.utils import ArgMaxSelector, EpsilonGreedySelector, TopKSelector
# Use ArgMaxSelector (default if none specified)
bandit = LinearTSBandit(
n_features=n_features,
selector=ArgMaxSelector()
)
# Use EpsilonGreedySelector for exploration
bandit = LinearTSBandit(
n_features=n_features,
selector=EpsilonGreedySelector(epsilon=0.1)
)
# Use TopKSelector for combinatorial bandits
bandit = LinearTSBandit(
n_features=n_features,
selector=TopKSelector(k=3) # Select top 3 arms
)
You can also implement custom selectors by subclassing AbstractSelector
:
from calvera.utils import AbstractSelector
class MyCustomSelector(AbstractSelector):
def __call__(self, scores: torch.Tensor) -> torch.Tensor:
# Your custom selection logic here
# scores shape: (batch_size, n_arms)
# should return one-hot encoding of shape (batch_size, n_arms)
pass
Data Storage Strategies
Calvera provides different strategies for managing interaction data:
- AllDataRetrievalStrategy: Stores all data
- SlidingWindowRetrievalStrategy: Stores only the most recent data
Here's how to use them:
from calvera.utils import TensorDataBuffer, AllDataRetrievalStrategy, SlidingWindowRetrievalStrategy
# Store all data
buffer = TensorDataBuffer(
retrieval_strategy=AllDataRetrievalStrategy(),
max_size=10000, # Optional: limit to 10000 samples
)
# Store only the most recent 1000 samples
buffer = TensorDataBuffer(
retrieval_strategy=SlidingWindowRetrievalStrategy(window_size=1000),
max_size=None, # No overall limit
)
Networks
Calvera supports a variety of network architectures for contextual bandits. This section demonstrates how to create and use custom networks with the framework.
Custom Network Architectures
For neural bandits, you can provide any PyTorch neural network:
import torch.nn as nn
# Simple MLP
network = nn.Sequential(
nn.Linear(input_size, 64),
nn.ReLU(),
nn.Linear(64, output_size)
)
# More complex architecture
class MyCustomNetwork(nn.Module):
def __init__(self, input_dim, embedding_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, 256)
self.activation = nn.ReLU()
self.dropout = nn.Dropout(0.2)
self.fc2 = nn.Linear(256, embedding_dim)
def forward(self, x):
x = self.activation(self.fc1(x))
x = self.dropout(x)
return self.fc2(x)
# Use the custom network
network = MyCustomNetwork(input_dim=128, embedding_dim=32)
bandit = NeuralUCBBandit(
n_features=10,
network=network,
# ... other parameters
)
Using Transformers Models (BERT) with Neural Linear Bandits
You can use BERT models from the Hugging Face Transformers library as feature extractors specifically with Neural Linear Bandit algorithm. Here's an example of integrating a BERT model with a Neural Linear Bandit:
from transformers import BertModel
from calvera.bandits import NeuralLinearBandit
from calvera.benchmark import BertWrapper
# Load a pre-trained BERT model
bert_model = BertModel.from_pretrained(
"google/bert_uncased_L-2_H-128_A-2",
output_hidden_states=True
).eval()
# Wrap the BERT model for compatibility with Calvera
network = BertWrapper(bert_model)
# Create the Neural Linear Bandit with the BERT model
bandit = NeuralLinearBandit(
network=network,
buffer=buffer,
n_embedding_size=128,
contextualization_after_network=True, # Important for transformer models
n_arms=2,
initial_train_steps=128,
min_samples_required_for_training=128,
)
Key Considerations for BERT Integration
- BertWrapper: Use the
BertWrapper
class to make BERT models compatible with Calvera's bandits. - Contextualization: Set
contextualization_after_network=True
when using transformer models. - Data Processing: Ensure your data is properly tokenized and formatted for BERT models. You can use a custom collate function for your DataLoader.
For a complete example of using BERT with Neural Linear Bandits for sentiment analysis on the IMDB Movie Reviews dataset, see neural_linear_bert notebooks in the Calvera repository.
Data Samplers
Calvera provides custom data samplers that give you control over how data is sampled from your datasets during training. These can be particularly useful when you want to prioritize certain samples or implement specialized sampling strategies.
Random Data Sampler
The RandomDataSampler
samples elements randomly without replacement:
from torch.utils.data import DataLoader
from calvera.utils import RandomDataSampler
from calvera.benchmark.datasets import StatlogDataset
# Create a dataset
dataset = StatlogDataset()
# Create a random sampler with optional reproducibility
generator = torch.Generator().manual_seed(42) # For reproducible sampling
sampler = RandomDataSampler(dataset, generator=generator)
# Use the sampler with a DataLoader
dataloader = DataLoader(
dataset,
batch_size=32,
sampler=sampler, # Use our custom sampler instead of shuffle=True
num_workers=4
)
# This dataloader will now sample data randomly using our sampler
for batch in dataloader:
# Process batch...
pass
Sorted Data Sampler
The SortedDataSampler
allows you to sample elements in a specific order based on a key function:
from calvera.utils import SortedDataSampler
# Define a key function that determines the sorting order
def key_fn(idx):
# Sort by the sum of features for each sample
return dataset.X[idx].sum().item()
# Create a sorted sampler
sampler = SortedDataSampler(dataset, key_fn=key_fn, reverse=True) # reverse=True for descending order
# Use the sampler with a DataLoader
dataloader = DataLoader(
dataset,
batch_size=32,
sampler=sampler,
num_workers=4
)
# This dataloader will provide batches in descending order of feature sums
for batch in dataloader:
# Process batch...
pass
This approach is useful for curriculum learning or when you want to train first on samples with certain characteristics.
Checkpointing
Calvera supports checkpointing via PyTorch Lightning's checkpoint system. This allows you to save and resume training, which is especially valuable for long-running experiments or when deploying models to production.
Saving Checkpoints
Here's how to enable checkpointing:
import lightning as pl
from lightning.pytorch.callbacks import ModelCheckpoint
from calvera.bandits import NeuralTSBandit
# Initialize your bandit
bandit = NeuralTSBandit(
n_features=dataset.context_size,
network=network,
buffer=buffer,
# ... other parameters ...
)
# Create checkpoint callback
checkpoint_callback = ModelCheckpoint(
dirpath="./checkpoints", # Directory to save checkpoints
filename="neural_ts-{step}", # Filename pattern including the step number
save_top_k=3, # Save the top 3 models (based on monitor)
monitor="loss", # Monitor this metric for determining the "best"
mode="min", # Lower loss is better
every_n_train_steps=100, # Save every 100 training steps
)
# Create trainer with checkpoint callback
trainer = pl.Trainer(
max_epochs=1,
callbacks=[checkpoint_callback],
enable_checkpointing=True,
# ... other trainer parameters ...
)
# Run training
trainer.fit(bandit)
# Get the best checkpoint path
best_model_path = checkpoint_callback.best_model_path
print(f"Best checkpoint saved at: {best_model_path}")
Loading Checkpoints
You can load a model from a checkpoint to resume training or for inference:
# Load model from checkpoint
loaded_bandit = NeuralTSBandit.load_from_checkpoint(
best_model_path,
n_features=dataset.context_size,
network=network, # Provide the network architecture
buffer=buffer, # Provide a buffer (can be empty)
# ... any other required parameters not stored in the checkpoint ...
)
# Use the loaded model for inference
context = torch.randn(10, 5, dataset.context_size)
chosen_actions, probabilities = loaded_bandit(context)
# Or resume training
trainer = pl.Trainer(
max_epochs=1,
# ... other trainer parameters ...
)
trainer.fit(loaded_bandit)
What Gets Checkpointed
When a Calvera bandit is checkpointed, the following components are saved:
- Model Parameters: All learnable parameters of the bandit, including neural network weights
- Buffer State: The data buffer state, including stored contexts and rewards
- Linear Head Parameters: For linear and neural linear bandits, the precision matrix, b vector, and theta
- Selector State: The state of the action selector (e.g., epsilon value and RNG state for EpsilonGreedySelector)
- Training State: Counters and flags related to training progress
- Hyperparameters: The hyperparameters used to initialize the model
Checkpoint Management
For managing checkpoints over long experiments:
from lightning.pytorch.callbacks import ModelCheckpoint
# Save only the best model
checkpoint_callback = ModelCheckpoint(
dirpath="./checkpoints",
filename="best-{step}",
save_top_k=1,
monitor="loss",
mode="min",
)
# Also save the latest model periodically
latest_checkpoint_callback = ModelCheckpoint(
dirpath="./checkpoints",
filename="latest-{step}",
save_top_k=1,
save_last=True,
every_n_train_steps=500,
)
# Add both callbacks to the trainer
trainer = pl.Trainer(
callbacks=[checkpoint_callback, latest_checkpoint_callback],
enable_checkpointing=True,
# ... other trainer parameters ...
)
This setup ensures you always have the best-performing model saved, as well as regular snapshots of recent training progress.
Benchmarking
Calvera provides a benchmarking module to easily compare different bandit algorithms on various datasets. Here's how to use it:
from calvera.benchmark import run
# Benchmark a Linear UCB bandit on the Covertype dataset
run(
{
"bandit": "lin_ucb",
"dataset": "covertype",
"max_samples": 5000,
"feedback_delay": 1,
"train_batch_size": 1,
"forward_batch_size": 1,
"bandit_hparams": {
"alpha": 1.0,
},
}
)
# Benchmark a Neural Linear bandit on the Covertype dataset
run(
{
"bandit": "neural_linear",
"dataset": "covertype",
"network": "tiny_mlp",
"max_samples": 5000,
"feedback_delay": 1,
"train_batch_size": 1,
"forward_batch_size": 1,
"data_strategy": "sliding_window",
"sliding_window_size": 1,
"bandit_hparams": {
"n_embedding_size": 128,
},
}
)
The benchmark module provides pre-defined network architectures:
none
: Identity mappinglinear
: Simple linear layertiny_mlp
: Small MLP with one hidden layer (64 units)small_mlp
: MLP with two hidden layers (128 units each)large_mlp
: MLP with three hidden layers (256 units each)deep_mlp
: Deep MLP with seven hidden layers (64 units each)bert
: BERT model for text data
These can be specified in the benchmark configuration:
run(
{
"bandit": "neural_linear",
"dataset": "statlog",
"network": "tiny_mlp", # Specify the network architecture
# ... other parameters
}
)
It also supports the following bandits:
lin_ucb
: Linear UCBapprox_lin_ucb
: Diagonal Precision Approximation Linear UCBlin_ts
: Linear Thompson Samplingapprox_lin_ts
: Diagonal Precision Approximation Linear Thompson Samplingneural_linear
: Neural Linearneural_ucb
: (Combinatorial + Standard) Neural UCBneural_ts
: (Combinatorial + Standard) Neural Thompson Sampling
And the following datasets:
covertype
: Covertype dataset from UCImnist
: MNIST datasetstatlog
: Statlog dataset from UCIwheel
: Synthetic Wheel Bandit datasetimdb
: IMDB movie reviews datasetmovielens
: MovieLens datasettiny_imagenet
: Tiny ImaneNet datasetsynthetic
: A binary classification synthetic datasetsynthetic_combinatorial
: Synthetic dataset for Combinatorial Bandit experiments
We predefined the following networks:
- tiny_mlp
: just a single linear layer.
- small_mlp
: just a single linear layer.
- large_mlp
: just a single linear layer.
- deep_mlp
: just a single linear layer.
- linear
: just a single linear layer.
- none
: Identity network.
- bert
: a small pre-trained BERT encoder (google/bert_uncased_L-2_H-128_A-2
). It is wrapped in calvera's ResNetWrapper
to allow for easy input shape handling.
- resnet
: a small pre-trained ResNet (resnet18.a1_in1k
) without final linear layer. It is wrapped in calvera's ResNetWrapper
to allow for easy input shape handling.
If you want to configure the benchmark with your own models, you can use the BanditBenchmark
class:
from calvera.bandits import NeuralLinearBandit
from calvera.benchmark import BanditBenchmark
from calvera.benchmark.datasets import StatlogDataset
network = MyCustomNetwork()
bandit = NeuralLinearBandit(
network=network,
n_embedding_size=128, # size of your networks embeddings
# ...
)
dataset = StatlogDataset()
benchmark = BanditBenchmark(
bandit,
dataset,
training_params={
"max_samples": 5000, # on how many samples to train
"forward_batch_size": 1, # The batch size for the forward pass.
"feedback_delay": 1, # The number of samples to collect before training.
# trainer arguments:
"max_steps": 2,
"log_every_n_steps": 1,
"gradient_clip_val": 0.5,
"training_sampler": None, # or SortedDataSampler if the inputted data should not be in i.i.d. order
"device": "cuda",
"seed": 42,
},
logger=lightning.pytorch.loggers.CSVLogger(
# ...
)
)
benchmark.run()
Further Resources
- For more examples, check the examples directory on GitHub.
- For API documentation, see the API reference.
- For implementation details, see the source code.