Parallel Evolution

This guide shows how to speed up evolution using parallel computation.

Enabling Parallelism

Parallel evaluation is enabled by default with the parallel feature:

[dependencies]
fugue-evo = { version = "0.1", features = ["parallel"] }

Parallel Fitness Evaluation

The simplest form of parallelism: evaluate multiple individuals simultaneously.

Using SimpleGA

let result = SimpleGABuilder::<RealVector, f64, _, _, _, _, _>::new()
    .population_size(100)
    .parallel(true)  // Enable parallel evaluation
    // ... other configuration
    .build()?
    .run(&mut rng)?;

Manual Parallel Evaluation

Using Rayon directly:

use rayon::prelude::*;

// Parallel evaluation
let fitnesses: Vec<f64> = population
    .par_iter()  // Parallel iterator
    .map(|individual| fitness.evaluate(individual.genome()))
    .collect();

// Update population with fitnesses
for (individual, fit) in population.iter_mut().zip(fitnesses) {
    individual.set_fitness(fit);
}

When Parallelism Helps

Good Candidates

ScenarioSpeedup Potential
Expensive fitness (>10ms)High
Large population (>100)Medium-High
Independent evaluationsHigh
Many CPU coresScales with cores

Poor Candidates

ScenarioWhy
Very cheap fitness (<1ms)Parallelization overhead dominates
Small population (<20)Not enough work to distribute
Dependent evaluationsCan't parallelize
Memory-bound fitnessCores compete for memory

Tuning Parallelism

Thread Pool Size

// Set number of threads
rayon::ThreadPoolBuilder::new()
    .num_threads(4)
    .build_global()
    .unwrap();

Chunk Size

For fine-grained control:

use rayon::prelude::*;

// Process in chunks
population
    .par_chunks(10)  // Process 10 at a time
    .for_each(|chunk| {
        for individual in chunk {
            // Evaluate
        }
    });

Island Model Parallelism

Islands naturally parallelize across populations:

use rayon::prelude::*;

// Each island evolves independently
islands.par_iter_mut().for_each(|island| {
    island.evolve_generation(&mut thread_rng());
});

// Synchronous migration
migrate_between_islands(&mut islands);

Thread Safety Considerations

Fitness Functions

Your fitness function must be thread-safe (Send + Sync):

// Good: Stateless fitness
struct MyFitness;

impl Fitness<RealVector> for MyFitness {
    type Value = f64;

    fn evaluate(&self, genome: &RealVector) -> f64 {
        // Pure computation, no shared state
        genome.genes().iter().sum()
    }
}

// Good: Shared read-only data
struct DataDrivenFitness {
    data: Arc<Vec<f64>>,  // Shared immutable data
}

// Bad: Mutable shared state
struct BadFitness {
    counter: Cell<usize>,  // Not thread-safe!
}

Thread-Local State

For fitness functions needing mutable state:

use std::cell::RefCell;

thread_local! {
    static CACHE: RefCell<HashMap<u64, f64>> = RefCell::new(HashMap::new());
}

impl Fitness<RealVector> for CachedFitness {
    type Value = f64;

    fn evaluate(&self, genome: &RealVector) -> f64 {
        let key = hash(genome);

        CACHE.with(|cache| {
            if let Some(&cached) = cache.borrow().get(&key) {
                return cached;
            }

            let result = expensive_compute(genome);
            cache.borrow_mut().insert(key, result);
            result
        })
    }
}

Benchmarking Parallelism

Measure actual speedup:

use std::time::Instant;

fn benchmark_parallel_vs_sequential(population: &[Individual<RealVector>], fitness: &impl Fitness<RealVector>) {
    // Sequential
    let start = Instant::now();
    for ind in population {
        fitness.evaluate(ind.genome());
    }
    let sequential_time = start.elapsed();

    // Parallel
    let start = Instant::now();
    population.par_iter().for_each(|ind| {
        fitness.evaluate(ind.genome());
    });
    let parallel_time = start.elapsed();

    println!("Sequential: {:?}", sequential_time);
    println!("Parallel: {:?}", parallel_time);
    println!("Speedup: {:.2}x", sequential_time.as_secs_f64() / parallel_time.as_secs_f64());
}

GPU Acceleration

For very expensive computations, consider GPU:

// Pseudocode for GPU fitness
struct GpuFitness {
    device: CudaDevice,
    kernel: CompiledKernel,
}

impl GpuFitness {
    fn evaluate_batch(&self, genomes: &[RealVector]) -> Vec<f64> {
        // Copy genomes to GPU
        let gpu_data = self.device.copy_to_device(genomes);

        // Run kernel
        let gpu_results = self.kernel.run(&gpu_data);

        // Copy results back
        self.device.copy_from_device(&gpu_results)
    }
}

Parallel Evolution Patterns

Pattern 1: Parallel Evaluation Only

for gen in 0..max_generations {
    // Sequential selection and reproduction
    let offspring = create_offspring(&population, &mut rng);

    // Parallel evaluation
    offspring.par_iter_mut().for_each(|ind| {
        let fit = fitness.evaluate(ind.genome());
        ind.set_fitness(fit);
    });

    population = offspring;
}

Pattern 2: Multiple Independent Runs

let results: Vec<_> = (0..num_runs)
    .into_par_iter()
    .map(|run| {
        let mut rng = StdRng::seed_from_u64(run as u64);
        run_evolution(&fitness, &mut rng)
    })
    .collect();

let best = results.iter()
    .max_by(|a, b| a.best_fitness.partial_cmp(&b.best_fitness).unwrap());

Pattern 3: Speculative Execution

Run multiple strategies, keep best:

let strategies = vec![
    Strategy::HighMutation,
    Strategy::HighCrossover,
    Strategy::Balanced,
];

let results: Vec<_> = strategies
    .par_iter()
    .map(|strategy| {
        run_with_strategy(&fitness, strategy, &mut thread_rng())
    })
    .collect();

Common Issues

Race Conditions

Symptom: Non-deterministic results, crashes

Solution: Ensure thread-safe fitness functions

High Overhead

Symptom: Parallel slower than sequential

Solution:

  • Increase population size
  • Use more expensive fitness
  • Reduce thread count

Memory Pressure

Symptom: Slowdown with many threads

Solution:

  • Limit thread count
  • Reduce population size
  • Stream results instead of collecting

Next Steps