CMA-ES Tutorial

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most powerful algorithms for continuous optimization, especially for non-separable and ill-conditioned problems.

When to Use CMA-ES

Ideal for:

  • Non-separable problems (variables are correlated)
  • Ill-conditioned problems (different scaling per variable)
  • Medium-dimensional problems (10-100 variables)
  • When you want automatic parameter adaptation

Not ideal for:

  • Very high-dimensional problems (n > 1000)
  • Discrete or combinatorial problems
  • Problems requiring custom genetic operators

How CMA-ES Works

CMA-ES maintains a multivariate normal distribution over the search space and iteratively:

  1. Sample new solutions from the distribution
  2. Evaluate and rank solutions by fitness
  3. Update the distribution mean toward better solutions
  4. Adapt the covariance matrix to learn variable correlations
  5. Adjust the step size (sigma) based on progress

The covariance matrix captures the shape of the fitness landscape, allowing efficient search even when variables are correlated.

Complete Example

//! CMA-ES Optimization Example
//!
//! This example demonstrates the Covariance Matrix Adaptation Evolution
//! Strategy (CMA-ES), a state-of-the-art derivative-free optimization
//! algorithm for continuous domains.
//!
//! CMA-ES adapts both the mean and covariance of a multivariate normal
//! distribution to efficiently search the fitness landscape.

use fugue_evo::prelude::*;
use rand::rngs::StdRng;
use rand::SeedableRng;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("=== CMA-ES Optimization Example ===\n");

    let mut rng = StdRng::seed_from_u64(42);

    // Rosenbrock function - a classic test for optimization algorithms
    // The global minimum is at (1, 1, ..., 1) with value 0
    const DIM: usize = 10;

    println!("Problem: {}-D Rosenbrock function", DIM);
    println!("Global optimum: 0.0 at (1, 1, ..., 1)\n");

    // Create CMA-ES optimizer
    let initial_mean = vec![0.0; DIM]; // Start at origin
    let initial_sigma = 0.5; // Initial step size
    let bounds = MultiBounds::symmetric(5.0, DIM);

    // Create fitness function that CMA-ES can use
    let fitness = RosenbrockCmaEs { dim: DIM };

    let mut cmaes = CmaEs::new(initial_mean, initial_sigma).with_bounds(bounds);

    // Run optimization
    let best = cmaes.run_generations(&fitness, 1000, &mut rng)?;

    println!("Results:");
    println!("  Best fitness (minimized): {:.10}", best.fitness_value());
    println!("  Generations: {}", cmaes.state.generation);
    println!("  Evaluations: {}", cmaes.state.evaluations);
    println!("  Final sigma: {:.6}", cmaes.state.sigma);

    println!("\nBest solution:");
    for (i, val) in best.genome.genes().iter().enumerate().take(5) {
        println!("  x[{}] = {:.6}", i, val);
    }
    if DIM > 5 {
        println!("  ... ({} more dimensions)", DIM - 5);
    }

    // Calculate distance from optimal solution
    let distance_from_opt: f64 = best
        .genome
        .genes()
        .iter()
        .map(|x| (x - 1.0).powi(2))
        .sum::<f64>()
        .sqrt();

    println!("\nDistance from optimum: {:.10}", distance_from_opt);

    Ok(())
}

/// Rosenbrock function wrapper for CMA-ES
struct RosenbrockCmaEs {
    dim: usize,
}

impl CmaEsFitness for RosenbrockCmaEs {
    fn evaluate(&self, x: &RealVector) -> f64 {
        let genes = x.genes();
        let mut sum = 0.0;
        for i in 0..self.dim - 1 {
            let term1 = genes[i + 1] - genes[i] * genes[i];
            let term2 = 1.0 - genes[i];
            sum += 100.0 * term1 * term1 + term2 * term2;
        }
        sum // CMA-ES minimizes, so return positive value
    }
}

Source: examples/cma_es_example.rs

Running the Example

cargo run --example cma_es_example

Key Components

Initialization

let initial_mean = vec![0.0; DIM];  // Starting point
let initial_sigma = 0.5;             // Initial step size
let bounds = MultiBounds::symmetric(5.0, DIM);

let mut cmaes = CmaEs::new(initial_mean, initial_sigma)
    .with_bounds(bounds);

Parameters:

  • initial_mean: Starting center of the search distribution
  • initial_sigma: Initial standard deviation (step size)
  • bounds: Optional constraints on the search space

Fitness Function

CMA-ES uses a different fitness trait than GA:

struct RosenbrockCmaEs { dim: usize }

impl CmaEsFitness for RosenbrockCmaEs {
    fn evaluate(&self, x: &RealVector) -> f64 {
        // Return value to MINIMIZE (not maximize!)
        let genes = x.genes();
        let mut sum = 0.0;
        for i in 0..self.dim - 1 {
            let term1 = genes[i + 1] - genes[i] * genes[i];
            let term2 = 1.0 - genes[i];
            sum += 100.0 * term1 * term1 + term2 * term2;
        }
        sum
    }
}

Important: CMA-ES minimizes by default (unlike GA which maximizes).

Running Optimization

let best = cmaes.run_generations(&fitness, 1000, &mut rng)?;

println!("Best fitness: {:.10}", best.fitness_value());
println!("Final sigma: {:.6}", cmaes.state.sigma);

The Rosenbrock Function

The example optimizes the Rosenbrock function:

f(x) = Σ[100(xᵢ₊₁ - xᵢ²)² + (1 - xᵢ)²]

Properties:

  • Global minimum: f(1,1,...,1) = 0
  • Curved valley: The optimum lies in a narrow, curved valley
  • Non-separable: Variables are strongly coupled
  • Tests algorithm's ability to learn correlations

CMA-ES excels at Rosenbrock because it learns the valley's shape through covariance adaptation.

Understanding CMA-ES Output

Sigma (Step Size)

println!("Final sigma: {:.6}", cmaes.state.sigma);
  • High sigma: Still exploring, large steps
  • Low sigma: Converging, fine-tuning
  • Oscillating sigma: May indicate problem with scale

Generations vs Evaluations

println!("Generations: {}", cmaes.state.generation);
println!("Evaluations: {}", cmaes.state.evaluations);

CMA-ES typically samples λ individuals per generation:

  • Default λ ≈ 4 + 3ln(n) for n dimensions
  • Total evaluations ≈ generations × λ

Tuning CMA-ES

Initial Step Size (Sigma)

let initial_sigma = 0.5;

Guidelines:

  • Should cover roughly 1/3 of the search space
  • Too small: Slow convergence, may miss global optimum
  • Too large: Wasted evaluations exploring infeasible regions

For bounds [-5, 5], sigma = (5 - (-5)) / 6 ≈ 1.67 or smaller.

Initial Mean

let initial_mean = vec![0.0; DIM];

If you have prior knowledge about good regions, start there:

let initial_mean = vec![1.0; DIM];  // Near expected optimum

Population Size

Default population is typically sufficient, but can be increased for multimodal problems:

let mut cmaes = CmaEs::new(initial_mean, initial_sigma)
    .with_population_size(100);  // More exploration

Comparison with GA

AspectCMA-ESSimple GA
OperatorsAutomatic adaptationManual configuration
CorrelationsLearns automaticallyRequires special operators
EvaluationsUsually fewerUsually more
FlexibilityFixed frameworkHighly customizable
MemoryO(n²)O(population × n)
Best forNon-separable continuousGeneral purpose

Advanced Usage

Early Stopping

for gen in 0..max_generations {
    cmaes.step(&fitness, &mut rng)?;

    // Check for convergence
    if cmaes.state.sigma < 1e-8 {
        println!("Converged at generation {}", gen);
        break;
    }
}

Accessing Distribution Parameters

let mean = &cmaes.state.mean;  // Current distribution center
let sigma = cmaes.state.sigma;  // Current step size
let generation = cmaes.state.generation;

Exercises

  1. Compare with GA: Run both CMA-ES and SimpleGA on Rosenbrock, compare evaluations needed
  2. Step size study: Try initial sigma values of 0.1, 0.5, 2.0, 5.0
  3. Higher dimensions: Increase DIM to 20, 50 and observe scaling

Next Steps