Hyperparameter Learning Tutorial

GA performance depends heavily on hyperparameters like mutation rate, crossover probability, and population size. This tutorial demonstrates online Bayesian learning of hyperparameters during evolution.

The Problem

Traditional approach: Set parameters once, hope they work.

Better approach: Learn optimal parameters from feedback during evolution.

Hyperparameter Control Methods

Fugue-evo supports several approaches (following Eiben et al.'s classification):

MethodDescriptionExample
DeterministicPre-defined scheduleDecay mutation over time
AdaptiveRule-based adjustmentIncrease mutation if stagnant
Self-AdaptiveEncode in genomeParameters evolve with solutions
BayesianStatistical learningUpdate beliefs from observations

This tutorial focuses on Bayesian learning with conjugate priors.

Complete Example

//! Hyperparameter Learning with Bayesian Adaptation
//!
//! This example demonstrates online Bayesian learning of GA hyperparameters.
//! The system learns optimal mutation rates based on observed fitness improvements.

use fugue_evo::prelude::*;
use rand::rngs::StdRng;
use rand::SeedableRng;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("=== Bayesian Hyperparameter Learning ===\n");

    let mut rng = StdRng::seed_from_u64(42);

    const DIM: usize = 20;
    let fitness = Rastrigin::new(DIM);
    let bounds = MultiBounds::symmetric(5.12, DIM);

    println!("Problem: {}-D Rastrigin", DIM);
    println!("Learning optimal mutation rate via Beta posterior\n");

    // Initialize Bayesian learner for mutation rate
    // Prior: Beta(2, 2) centered around 0.5
    let mut mutation_posterior = BetaPosterior::new(2.0, 2.0);

    // Track statistics
    let mut successful_mutations = 0;
    let mut total_mutations = 0;

    // Initialize population
    let mut population: Population<RealVector, f64> = Population::random(100, &bounds, &mut rng);
    population.evaluate(&fitness);

    let selection = TournamentSelection::new(3);
    let crossover = SbxCrossover::new(15.0);

    // Initial mutation rate from prior
    let mut current_mutation_rate = mutation_posterior.mean();

    println!(
        "Initial mutation rate (prior mean): {:.4}",
        current_mutation_rate
    );
    println!();

    let max_generations = 200;
    let adaptation_interval = 20;

    for gen in 0..max_generations {
        // Sample mutation rate from current posterior every adaptation interval
        if gen > 0 && gen % adaptation_interval == 0 {
            current_mutation_rate = mutation_posterior.sample(&mut rng);
            println!(
                "Gen {:3}: Sampled mutation rate = {:.4} (posterior mean = {:.4})",
                gen,
                current_mutation_rate,
                mutation_posterior.mean()
            );
        }

        let selection_pool: Vec<_> = population.as_fitness_pairs();
        let mut new_pop: Population<RealVector, f64> = Population::with_capacity(100);

        // Elitism
        if let Some(best) = population.best() {
            new_pop.push(best.clone());
        }

        while new_pop.len() < 100 {
            let p1_idx = selection.select(&selection_pool, &mut rng);
            let p2_idx = selection.select(&selection_pool, &mut rng);

            let (mut c1, mut c2) = crossover
                .crossover(
                    &selection_pool[p1_idx].0,
                    &selection_pool[p2_idx].0,
                    &mut rng,
                )
                .genome()
                .unwrap_or_else(|| {
                    (
                        selection_pool[p1_idx].0.clone(),
                        selection_pool[p2_idx].0.clone(),
                    )
                });

            let parent1_fitness = selection_pool[p1_idx].1;
            let parent2_fitness = selection_pool[p2_idx].1;

            // Apply mutation with learned rate
            let mutation = GaussianMutation::new(0.1).with_probability(current_mutation_rate);
            mutation.mutate(&mut c1, &mut rng);
            mutation.mutate(&mut c2, &mut rng);

            // Evaluate children
            let child1_fitness = fitness.evaluate(&c1);
            let child2_fitness = fitness.evaluate(&c2);

            // Update Bayesian posterior based on improvement
            let improved1 = child1_fitness > parent1_fitness;
            let improved2 = child2_fitness > parent2_fitness;

            mutation_posterior.observe(improved1);
            mutation_posterior.observe(improved2);

            total_mutations += 2;
            if improved1 {
                successful_mutations += 1;
            }
            if improved2 {
                successful_mutations += 1;
            }

            new_pop.push(Individual::with_fitness(c1, child1_fitness));
            if new_pop.len() < 100 {
                new_pop.push(Individual::with_fitness(c2, child2_fitness));
            }
        }

        new_pop.set_generation(gen + 1);
        population = new_pop;
    }

    // Results
    println!("\n=== Results ===");
    let best = population.best().unwrap();
    println!("Best fitness: {:.6}", best.fitness_value());
    println!();

    println!("Learned hyperparameters:");
    println!(
        "  Final mutation rate (posterior mean): {:.4}",
        mutation_posterior.mean()
    );
    let ci = mutation_posterior.credible_interval(0.95);
    println!("  95% credible interval: [{:.4}, {:.4}]", ci.0, ci.1);
    println!();

    println!("Mutation statistics:");
    println!("  Total mutations: {}", total_mutations);
    println!("  Successful mutations: {}", successful_mutations);
    println!(
        "  Observed success rate: {:.4}",
        successful_mutations as f64 / total_mutations as f64
    );

    // Compare with fixed rates
    println!("\n--- Comparison with fixed mutation rates ---\n");

    for fixed_rate in [0.05, 0.1, 0.2, 0.5] {
        let result = run_with_fixed_rate(fixed_rate, DIM)?;
        println!("Fixed rate {:.2}: Best = {:.6}", fixed_rate, result);
    }

    println!(
        "\nLearned rate {:.2}: Best = {:.6}",
        mutation_posterior.mean(),
        best.fitness_value()
    );

    Ok(())
}

fn run_with_fixed_rate(rate: f64, dim: usize) -> Result<f64, Box<dyn std::error::Error>> {
    let mut rng = StdRng::seed_from_u64(42); // Same seed for fair comparison

    let fitness = Rastrigin::new(dim);
    let bounds = MultiBounds::symmetric(5.12, dim);

    let mut population: Population<RealVector, f64> = Population::random(100, &bounds, &mut rng);
    population.evaluate(&fitness);

    let selection = TournamentSelection::new(3);
    let crossover = SbxCrossover::new(15.0);
    let mutation = GaussianMutation::new(0.1).with_probability(rate);

    for gen in 0..200 {
        let selection_pool: Vec<_> = population.as_fitness_pairs();
        let mut new_pop: Population<RealVector, f64> = Population::with_capacity(100);

        if let Some(best) = population.best() {
            new_pop.push(best.clone());
        }

        while new_pop.len() < 100 {
            let p1_idx = selection.select(&selection_pool, &mut rng);
            let p2_idx = selection.select(&selection_pool, &mut rng);

            let (mut c1, mut c2) = crossover
                .crossover(
                    &selection_pool[p1_idx].0,
                    &selection_pool[p2_idx].0,
                    &mut rng,
                )
                .genome()
                .unwrap_or_else(|| {
                    (
                        selection_pool[p1_idx].0.clone(),
                        selection_pool[p2_idx].0.clone(),
                    )
                });

            mutation.mutate(&mut c1, &mut rng);
            mutation.mutate(&mut c2, &mut rng);

            new_pop.push(Individual::new(c1));
            if new_pop.len() < 100 {
                new_pop.push(Individual::new(c2));
            }
        }

        new_pop.evaluate(&fitness);
        new_pop.set_generation(gen + 1);
        population = new_pop;
    }

    Ok(*population.best().unwrap().fitness_value())
}

Source: examples/hyperparameter_learning.rs

Running the Example

cargo run --example hyperparameter_learning

Key Components

Beta Posterior for Mutation Rate

// Prior: Beta(2, 2) centered around 0.5
let mut mutation_posterior = BetaPosterior::new(2.0, 2.0);

The Beta distribution is perfect for learning probabilities:

  • Domain: [0, 1] (valid probability range)
  • Conjugate to Bernoulli outcomes (success/failure)
  • Prior parameters encode initial beliefs

Beta(2, 2):

  • Mean = 0.5 (start uncertain)
  • Moderate confidence (equivalent to 4 observations)

Observing Outcomes

// Check if mutation improved fitness
let improved = child_fitness > parent_fitness;

// Update posterior with observation
mutation_posterior.observe(improved);

Each observation updates the distribution:

  • Success (improvement): Increases mean
  • Failure: Decreases mean
  • More observations → narrower distribution

Sampling Parameters

// Sample mutation rate from current posterior
current_mutation_rate = mutation_posterior.sample(&mut rng);

Thompson Sampling: Sample from posterior, use as parameter.

  • Balances exploration (uncertainty) and exploitation (best estimate)
  • Naturally adapts as confidence grows

Adaptation Interval

if gen % adaptation_interval == 0 {
    current_mutation_rate = mutation_posterior.sample(&mut rng);
}

Don't update every generation:

  • Too frequent: Not enough signal
  • Too rare: Slow adaptation
  • Typical: Every 10-50 generations

Understanding the Output

Initial mutation rate (prior mean): 0.5000

Gen  20: Sampled mutation rate = 0.4123 (posterior mean = 0.3876)
Gen  40: Sampled mutation rate = 0.3456 (posterior mean = 0.3245)
...

=== Results ===
Learned hyperparameters:
  Final mutation rate (posterior mean): 0.2134
  95% credible interval: [0.1823, 0.2445]

Mutation statistics:
  Total mutations: 20000
  Successful mutations: 4268
  Observed success rate: 0.2134

The posterior converges toward the empirically optimal rate.

Credible Intervals

let ci = mutation_posterior.credible_interval(0.95);
println!("95% CI: [{:.4}, {:.4}]", ci.0, ci.1);

Unlike frequentist confidence intervals, Bayesian credible intervals have a direct interpretation: "95% probability the true value is in this range (given our data)."

Comparing with Fixed Rates

for fixed_rate in [0.05, 0.1, 0.2, 0.5] {
    let result = run_with_fixed_rate(fixed_rate)?;
    println!("Fixed rate {:.2}: Best = {:.6}", fixed_rate, result);
}

The learned rate often outperforms any single fixed rate because:

  1. It adapts to the problem
  2. It can change as evolution progresses
  3. It handles different phases (exploration vs. exploitation)

Other Learnable Parameters

Crossover Probability

let mut crossover_posterior = BetaPosterior::new(2.0, 2.0);

// Observe: did crossover produce better offspring than parents?
let offspring_better = child_fitness > max(parent1_fitness, parent2_fitness);
crossover_posterior.observe(offspring_better);

Tournament Size

Use a categorical posterior for discrete choices:

let tournament_sizes = [2, 3, 5, 7];
let mut size_weights = vec![1.0; tournament_sizes.len()];

// Update weights based on selection quality
// ... observe which sizes produce better offspring

Multiple Parameters

Learn multiple parameters simultaneously:

struct AdaptiveGA {
    mutation_posterior: BetaPosterior,
    crossover_posterior: BetaPosterior,
    // ... other parameters
}

impl AdaptiveGA {
    fn adapt(&mut self, gen: usize, rng: &mut Rng) {
        if gen % interval == 0 {
            self.mutation_rate = self.mutation_posterior.sample(rng);
            self.crossover_prob = self.crossover_posterior.sample(rng);
        }
    }
}

Deterministic Schedules

For simpler adaptation, use time-based schedules:

use fugue_evo::hyperparameter::schedules::*;

// Linear decay: 0.5 → 0.05 over 500 generations
let schedule = LinearSchedule::new(0.5, 0.05, 500);
let rate = schedule.value_at(gen);

// Exponential decay
let schedule = ExponentialSchedule::new(0.5, 0.99, 500);

// Sigmoid decay
let schedule = SigmoidSchedule::new(0.5, 0.05, 500);

When to Use Which Method

ScenarioRecommended
Known good parametersFixed
Exploration→exploitationDeterministic schedule
Problem-dependent optimalBayesian learning
No prior knowledgeBayesian with weak prior
Fast prototypingAdaptive rules

Exercises

  1. Prior sensitivity: Try Beta(1,1), Beta(5,5), Beta(10,1) priors
  2. Learning speed: Vary adaptation interval (5, 20, 50, 100)
  3. Multiple parameters: Learn both mutation and crossover rates

Next Steps