CMA-ES Tutorial

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most powerful algorithms for continuous optimization, especially for non-separable and ill-conditioned problems.

When to Use CMA-ES

Ideal for:

Non-separable problems (variables are correlated)
Ill-conditioned problems (different scaling per variable)
Medium-dimensional problems (10-100 variables)
When you want automatic parameter adaptation

Not ideal for:

Very high-dimensional problems (n > 1000)
Discrete or combinatorial problems
Problems requiring custom genetic operators

How CMA-ES Works

CMA-ES maintains a multivariate normal distribution over the search space and iteratively:

Sample new solutions from the distribution
Evaluate and rank solutions by fitness
Update the distribution mean toward better solutions
Adapt the covariance matrix to learn variable correlations
Adjust the step size (sigma) based on progress

The covariance matrix captures the shape of the fitness landscape, allowing efficient search even when variables are correlated.

Complete Example

//! CMA-ES Optimization Example
//!
//! This example demonstrates the Covariance Matrix Adaptation Evolution
//! Strategy (CMA-ES), a state-of-the-art derivative-free optimization
//! algorithm for continuous domains.
//!
//! CMA-ES adapts both the mean and covariance of a multivariate normal
//! distribution to efficiently search the fitness landscape.

use fugue_evo::prelude::*;
use rand::rngs::StdRng;
use rand::SeedableRng;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("=== CMA-ES Optimization Example ===\n");

    let mut rng = StdRng::seed_from_u64(42);

    // Rosenbrock function - a classic test for optimization algorithms
    // The global minimum is at (1, 1, ..., 1) with value 0
    const DIM: usize = 10;

    println!("Problem: {}-D Rosenbrock function", DIM);
    println!("Global optimum: 0.0 at (1, 1, ..., 1)\n");

    // Create CMA-ES optimizer
    let initial_mean = vec![0.0; DIM]; // Start at origin
    let initial_sigma = 0.5; // Initial step size
    let bounds = MultiBounds::symmetric(5.0, DIM);

    // Create fitness function that CMA-ES can use
    let fitness = RosenbrockCmaEs { dim: DIM };

    let mut cmaes = CmaEs::new(initial_mean, initial_sigma).with_bounds(bounds);

    // Run optimization
    let best = cmaes.run_generations(&fitness, 1000, &mut rng)?;

    println!("Results:");
    println!("  Best fitness (minimized): {:.10}", best.fitness_value());
    println!("  Generations: {}", cmaes.state.generation);
    println!("  Evaluations: {}", cmaes.state.evaluations);
    println!("  Final sigma: {:.6}", cmaes.state.sigma);

    println!("\nBest solution:");
    for (i, val) in best.genome.genes().iter().enumerate().take(5) {
        println!("  x[{}] = {:.6}", i, val);
    }
    if DIM > 5 {
        println!("  ... ({} more dimensions)", DIM - 5);
    }

    // Calculate distance from optimal solution
    let distance_from_opt: f64 = best
        .genome
        .genes()
        .iter()
        .map(|x| (x - 1.0).powi(2))
        .sum::<f64>()
        .sqrt();

    println!("\nDistance from optimum: {:.10}", distance_from_opt);

    Ok(())
}

/// Rosenbrock function wrapper for CMA-ES
struct RosenbrockCmaEs {
    dim: usize,
}

impl CmaEsFitness for RosenbrockCmaEs {
    fn evaluate(&self, x: &RealVector) -> f64 {
        let genes = x.genes();
        let mut sum = 0.0;
        for i in 0..self.dim - 1 {
            let term1 = genes[i + 1] - genes[i] * genes[i];
            let term2 = 1.0 - genes[i];
            sum += 100.0 * term1 * term1 + term2 * term2;
        }
        sum // CMA-ES minimizes, so return positive value
    }
}

Source: examples/cma_es_example.rs

Running the Example

cargo run --example cma_es_example

Key Components

Initialization

let initial_mean = vec![0.0; DIM];  // Starting point
let initial_sigma = 0.5;             // Initial step size
let bounds = MultiBounds::symmetric(5.0, DIM);

let mut cmaes = CmaEs::new(initial_mean, initial_sigma)
    .with_bounds(bounds);

Parameters:

initial_mean: Starting center of the search distribution
initial_sigma: Initial standard deviation (step size)
bounds: Optional constraints on the search space

Fitness Function

CMA-ES uses a different fitness trait than GA:

struct RosenbrockCmaEs { dim: usize }

impl CmaEsFitness for RosenbrockCmaEs {
    fn evaluate(&self, x: &RealVector) -> f64 {
        // Return value to MINIMIZE (not maximize!)
        let genes = x.genes();
        let mut sum = 0.0;
        for i in 0..self.dim - 1 {
            let term1 = genes[i + 1] - genes[i] * genes[i];
            let term2 = 1.0 - genes[i];
            sum += 100.0 * term1 * term1 + term2 * term2;
        }
        sum
    }
}

Important: CMA-ES minimizes by default (unlike GA which maximizes).

Running Optimization

let best = cmaes.run_generations(&fitness, 1000, &mut rng)?;

println!("Best fitness: {:.10}", best.fitness_value());
println!("Final sigma: {:.6}", cmaes.state.sigma);

The Rosenbrock Function

The example optimizes the Rosenbrock function:

f(x) = Σ[100(xᵢ₊₁ - xᵢ²)² + (1 - xᵢ)²]

Properties:

Global minimum: f(1,1,...,1) = 0
Curved valley: The optimum lies in a narrow, curved valley
Non-separable: Variables are strongly coupled
Tests algorithm's ability to learn correlations

CMA-ES excels at Rosenbrock because it learns the valley's shape through covariance adaptation.

Understanding CMA-ES Output

Sigma (Step Size)

println!("Final sigma: {:.6}", cmaes.state.sigma);

High sigma: Still exploring, large steps
Low sigma: Converging, fine-tuning
Oscillating sigma: May indicate problem with scale

Generations vs Evaluations

println!("Generations: {}", cmaes.state.generation);
println!("Evaluations: {}", cmaes.state.evaluations);

CMA-ES typically samples λ individuals per generation:

Default λ ≈ 4 + 3ln(n) for n dimensions
Total evaluations ≈ generations × λ

Tuning CMA-ES

Initial Step Size (Sigma)

let initial_sigma = 0.5;

Guidelines:

Should cover roughly 1/3 of the search space
Too small: Slow convergence, may miss global optimum
Too large: Wasted evaluations exploring infeasible regions

For bounds [-5, 5], sigma = (5 - (-5)) / 6 ≈ 1.67 or smaller.

Initial Mean

let initial_mean = vec![0.0; DIM];

If you have prior knowledge about good regions, start there:

let initial_mean = vec![1.0; DIM];  // Near expected optimum

Population Size

Default population is typically sufficient, but can be increased for multimodal problems:

let mut cmaes = CmaEs::new(initial_mean, initial_sigma)
    .with_population_size(100);  // More exploration

Comparison with GA

Aspect	CMA-ES	Simple GA
Operators	Automatic adaptation	Manual configuration
Correlations	Learns automatically	Requires special operators
Evaluations	Usually fewer	Usually more
Flexibility	Fixed framework	Highly customizable
Memory	O(n²)	O(population × n)
Best for	Non-separable continuous	General purpose

Advanced Usage

Early Stopping

for gen in 0..max_generations {
    cmaes.step(&fitness, &mut rng)?;

    // Check for convergence
    if cmaes.state.sigma < 1e-8 {
        println!("Converged at generation {}", gen);
        break;
    }
}

Accessing Distribution Parameters

let mean = &cmaes.state.mean;  // Current distribution center
let sigma = cmaes.state.sigma;  // Current step size
let generation = cmaes.state.generation;

Exercises

Compare with GA: Run both CMA-ES and SimpleGA on Rosenbrock, compare evaluations needed
Step size study: Try initial sigma values of 0.1, 0.5, 2.0, 5.0
Higher dimensions: Increase DIM to 20, 50 and observe scaling

Next Steps

Island Model Tutorial - Parallel populations for multimodal problems
Multi-Objective Optimization - When you have multiple objectives

Fugue-Evo Documentation