Hyperparameter Learning Tutorial
GA performance depends heavily on hyperparameters like mutation rate, crossover probability, and population size. This tutorial demonstrates online Bayesian learning of hyperparameters during evolution.
The Problem
Traditional approach: Set parameters once, hope they work.
Better approach: Learn optimal parameters from feedback during evolution.
Hyperparameter Control Methods
Fugue-evo supports several approaches (following Eiben et al.'s classification):
| Method | Description | Example |
|---|---|---|
| Deterministic | Pre-defined schedule | Decay mutation over time |
| Adaptive | Rule-based adjustment | Increase mutation if stagnant |
| Self-Adaptive | Encode in genome | Parameters evolve with solutions |
| Bayesian | Statistical learning | Update beliefs from observations |
This tutorial focuses on Bayesian learning with conjugate priors.
Complete Example
//! Hyperparameter Learning with Bayesian Adaptation
//!
//! This example demonstrates online Bayesian learning of GA hyperparameters.
//! The system learns optimal mutation rates based on observed fitness improvements.
use fugue_evo::prelude::*;
use rand::rngs::StdRng;
use rand::SeedableRng;
fn main() -> Result<(), Box<dyn std::error::Error>> {
println!("=== Bayesian Hyperparameter Learning ===\n");
let mut rng = StdRng::seed_from_u64(42);
const DIM: usize = 20;
let fitness = Rastrigin::new(DIM);
let bounds = MultiBounds::symmetric(5.12, DIM);
println!("Problem: {}-D Rastrigin", DIM);
println!("Learning optimal mutation rate via Beta posterior\n");
// Initialize Bayesian learner for mutation rate
// Prior: Beta(2, 2) centered around 0.5
let mut mutation_posterior = BetaPosterior::new(2.0, 2.0);
// Track statistics
let mut successful_mutations = 0;
let mut total_mutations = 0;
// Initialize population
let mut population: Population<RealVector, f64> = Population::random(100, &bounds, &mut rng);
population.evaluate(&fitness);
let selection = TournamentSelection::new(3);
let crossover = SbxCrossover::new(15.0);
// Initial mutation rate from prior
let mut current_mutation_rate = mutation_posterior.mean();
println!(
"Initial mutation rate (prior mean): {:.4}",
current_mutation_rate
);
println!();
let max_generations = 200;
let adaptation_interval = 20;
for gen in 0..max_generations {
// Sample mutation rate from current posterior every adaptation interval
if gen > 0 && gen % adaptation_interval == 0 {
current_mutation_rate = mutation_posterior.sample(&mut rng);
println!(
"Gen {:3}: Sampled mutation rate = {:.4} (posterior mean = {:.4})",
gen,
current_mutation_rate,
mutation_posterior.mean()
);
}
let selection_pool: Vec<_> = population.as_fitness_pairs();
let mut new_pop: Population<RealVector, f64> = Population::with_capacity(100);
// Elitism
if let Some(best) = population.best() {
new_pop.push(best.clone());
}
while new_pop.len() < 100 {
let p1_idx = selection.select(&selection_pool, &mut rng);
let p2_idx = selection.select(&selection_pool, &mut rng);
let (mut c1, mut c2) = crossover
.crossover(
&selection_pool[p1_idx].0,
&selection_pool[p2_idx].0,
&mut rng,
)
.genome()
.unwrap_or_else(|| {
(
selection_pool[p1_idx].0.clone(),
selection_pool[p2_idx].0.clone(),
)
});
let parent1_fitness = selection_pool[p1_idx].1;
let parent2_fitness = selection_pool[p2_idx].1;
// Apply mutation with learned rate
let mutation = GaussianMutation::new(0.1).with_probability(current_mutation_rate);
mutation.mutate(&mut c1, &mut rng);
mutation.mutate(&mut c2, &mut rng);
// Evaluate children
let child1_fitness = fitness.evaluate(&c1);
let child2_fitness = fitness.evaluate(&c2);
// Update Bayesian posterior based on improvement
let improved1 = child1_fitness > parent1_fitness;
let improved2 = child2_fitness > parent2_fitness;
mutation_posterior.observe(improved1);
mutation_posterior.observe(improved2);
total_mutations += 2;
if improved1 {
successful_mutations += 1;
}
if improved2 {
successful_mutations += 1;
}
new_pop.push(Individual::with_fitness(c1, child1_fitness));
if new_pop.len() < 100 {
new_pop.push(Individual::with_fitness(c2, child2_fitness));
}
}
new_pop.set_generation(gen + 1);
population = new_pop;
}
// Results
println!("\n=== Results ===");
let best = population.best().unwrap();
println!("Best fitness: {:.6}", best.fitness_value());
println!();
println!("Learned hyperparameters:");
println!(
" Final mutation rate (posterior mean): {:.4}",
mutation_posterior.mean()
);
let ci = mutation_posterior.credible_interval(0.95);
println!(" 95% credible interval: [{:.4}, {:.4}]", ci.0, ci.1);
println!();
println!("Mutation statistics:");
println!(" Total mutations: {}", total_mutations);
println!(" Successful mutations: {}", successful_mutations);
println!(
" Observed success rate: {:.4}",
successful_mutations as f64 / total_mutations as f64
);
// Compare with fixed rates
println!("\n--- Comparison with fixed mutation rates ---\n");
for fixed_rate in [0.05, 0.1, 0.2, 0.5] {
let result = run_with_fixed_rate(fixed_rate, DIM)?;
println!("Fixed rate {:.2}: Best = {:.6}", fixed_rate, result);
}
println!(
"\nLearned rate {:.2}: Best = {:.6}",
mutation_posterior.mean(),
best.fitness_value()
);
Ok(())
}
fn run_with_fixed_rate(rate: f64, dim: usize) -> Result<f64, Box<dyn std::error::Error>> {
let mut rng = StdRng::seed_from_u64(42); // Same seed for fair comparison
let fitness = Rastrigin::new(dim);
let bounds = MultiBounds::symmetric(5.12, dim);
let mut population: Population<RealVector, f64> = Population::random(100, &bounds, &mut rng);
population.evaluate(&fitness);
let selection = TournamentSelection::new(3);
let crossover = SbxCrossover::new(15.0);
let mutation = GaussianMutation::new(0.1).with_probability(rate);
for gen in 0..200 {
let selection_pool: Vec<_> = population.as_fitness_pairs();
let mut new_pop: Population<RealVector, f64> = Population::with_capacity(100);
if let Some(best) = population.best() {
new_pop.push(best.clone());
}
while new_pop.len() < 100 {
let p1_idx = selection.select(&selection_pool, &mut rng);
let p2_idx = selection.select(&selection_pool, &mut rng);
let (mut c1, mut c2) = crossover
.crossover(
&selection_pool[p1_idx].0,
&selection_pool[p2_idx].0,
&mut rng,
)
.genome()
.unwrap_or_else(|| {
(
selection_pool[p1_idx].0.clone(),
selection_pool[p2_idx].0.clone(),
)
});
mutation.mutate(&mut c1, &mut rng);
mutation.mutate(&mut c2, &mut rng);
new_pop.push(Individual::new(c1));
if new_pop.len() < 100 {
new_pop.push(Individual::new(c2));
}
}
new_pop.evaluate(&fitness);
new_pop.set_generation(gen + 1);
population = new_pop;
}
Ok(*population.best().unwrap().fitness_value())
}
Running the Example
cargo run --example hyperparameter_learning
Key Components
Beta Posterior for Mutation Rate
// Prior: Beta(2, 2) centered around 0.5
let mut mutation_posterior = BetaPosterior::new(2.0, 2.0);
The Beta distribution is perfect for learning probabilities:
- Domain: [0, 1] (valid probability range)
- Conjugate to Bernoulli outcomes (success/failure)
- Prior parameters encode initial beliefs
Beta(2, 2):
- Mean = 0.5 (start uncertain)
- Moderate confidence (equivalent to 4 observations)
Observing Outcomes
// Check if mutation improved fitness
let improved = child_fitness > parent_fitness;
// Update posterior with observation
mutation_posterior.observe(improved);
Each observation updates the distribution:
- Success (improvement): Increases mean
- Failure: Decreases mean
- More observations → narrower distribution
Sampling Parameters
// Sample mutation rate from current posterior
current_mutation_rate = mutation_posterior.sample(&mut rng);
Thompson Sampling: Sample from posterior, use as parameter.
- Balances exploration (uncertainty) and exploitation (best estimate)
- Naturally adapts as confidence grows
Adaptation Interval
if gen % adaptation_interval == 0 {
current_mutation_rate = mutation_posterior.sample(&mut rng);
}
Don't update every generation:
- Too frequent: Not enough signal
- Too rare: Slow adaptation
- Typical: Every 10-50 generations
Understanding the Output
Initial mutation rate (prior mean): 0.5000
Gen 20: Sampled mutation rate = 0.4123 (posterior mean = 0.3876)
Gen 40: Sampled mutation rate = 0.3456 (posterior mean = 0.3245)
...
=== Results ===
Learned hyperparameters:
Final mutation rate (posterior mean): 0.2134
95% credible interval: [0.1823, 0.2445]
Mutation statistics:
Total mutations: 20000
Successful mutations: 4268
Observed success rate: 0.2134
The posterior converges toward the empirically optimal rate.
Credible Intervals
let ci = mutation_posterior.credible_interval(0.95);
println!("95% CI: [{:.4}, {:.4}]", ci.0, ci.1);
Unlike frequentist confidence intervals, Bayesian credible intervals have a direct interpretation: "95% probability the true value is in this range (given our data)."
Comparing with Fixed Rates
for fixed_rate in [0.05, 0.1, 0.2, 0.5] {
let result = run_with_fixed_rate(fixed_rate)?;
println!("Fixed rate {:.2}: Best = {:.6}", fixed_rate, result);
}
The learned rate often outperforms any single fixed rate because:
- It adapts to the problem
- It can change as evolution progresses
- It handles different phases (exploration vs. exploitation)
Other Learnable Parameters
Crossover Probability
let mut crossover_posterior = BetaPosterior::new(2.0, 2.0);
// Observe: did crossover produce better offspring than parents?
let offspring_better = child_fitness > max(parent1_fitness, parent2_fitness);
crossover_posterior.observe(offspring_better);
Tournament Size
Use a categorical posterior for discrete choices:
let tournament_sizes = [2, 3, 5, 7];
let mut size_weights = vec![1.0; tournament_sizes.len()];
// Update weights based on selection quality
// ... observe which sizes produce better offspring
Multiple Parameters
Learn multiple parameters simultaneously:
struct AdaptiveGA {
mutation_posterior: BetaPosterior,
crossover_posterior: BetaPosterior,
// ... other parameters
}
impl AdaptiveGA {
fn adapt(&mut self, gen: usize, rng: &mut Rng) {
if gen % interval == 0 {
self.mutation_rate = self.mutation_posterior.sample(rng);
self.crossover_prob = self.crossover_posterior.sample(rng);
}
}
}
Deterministic Schedules
For simpler adaptation, use time-based schedules:
use fugue_evo::hyperparameter::schedules::*;
// Linear decay: 0.5 → 0.05 over 500 generations
let schedule = LinearSchedule::new(0.5, 0.05, 500);
let rate = schedule.value_at(gen);
// Exponential decay
let schedule = ExponentialSchedule::new(0.5, 0.99, 500);
// Sigmoid decay
let schedule = SigmoidSchedule::new(0.5, 0.05, 500);
When to Use Which Method
| Scenario | Recommended |
|---|---|
| Known good parameters | Fixed |
| Exploration→exploitation | Deterministic schedule |
| Problem-dependent optimal | Bayesian learning |
| No prior knowledge | Bayesian with weak prior |
| Fast prototyping | Adaptive rules |
Exercises
- Prior sensitivity: Try Beta(1,1), Beta(5,5), Beta(10,1) priors
- Learning speed: Vary adaptation interval (5, 20, 50, 100)
- Multiple parameters: Learn both mutation and crossover rates
Next Steps
- Custom Operators - Create learnable custom operators
- Advanced Algorithms - Built-in adaptive algorithms