Genetic Programming Tutorial
Genetic Programming (GP) evolves tree-structured programs rather than fixed-length vectors. This tutorial demonstrates symbolic regression - discovering mathematical expressions that fit data.
When to Use Genetic Programming
Ideal for:
- Symbolic regression (finding formulas)
- Automated program synthesis
- Discovery of mathematical relationships
- Feature construction for machine learning
Challenges:
- Bloat (trees grow unnecessarily large)
- More complex operators
- Larger search spaces
The Problem: Symbolic Regression
Given input-output pairs, find a mathematical expression that fits the data.
Target function:
f(x) = x² + 2x + 1
Training data:
x: -5, -4, ..., 4, 5
y: f(x) for each x
Goal: Rediscover the formula (or an equivalent one) from data alone.
Complete Example
//! Symbolic Regression with Genetic Programming
//!
//! This example demonstrates genetic programming using tree genomes
//! to discover symbolic expressions that fit data.
//!
//! The goal is to find a mathematical expression that approximates
//! a target function from input-output examples.
use fugue_evo::prelude::*;
use rand::rngs::StdRng;
use rand::Rng;
use rand::SeedableRng;
fn main() -> Result<(), Box<dyn std::error::Error>> {
println!("=== Symbolic Regression with GP ===\n");
let mut rng = StdRng::seed_from_u64(42);
// Generate training data from target function: f(x) = x^2 + 2*x + 1
let training_data: Vec<(f64, f64)> = (-5..=5)
.map(|i| {
let x = i as f64;
let y = x * x + 2.0 * x + 1.0; // Target function
(x, y)
})
.collect();
println!("Target function: f(x) = x^2 + 2*x + 1");
println!("Training points: {}", training_data.len());
println!();
// Create fitness function for symbolic regression
let fitness = SymbolicRegressionFitness::new(training_data.clone());
// Create initial population of random trees
let mut population: Vec<TreeGenome<ArithmeticTerminal, ArithmeticFunction>> = Vec::new();
for _ in 0..100 {
// Mix of full and grow initialization
let tree = if rng.gen::<bool>() {
TreeGenome::generate_full(&mut rng, 3, 6)
} else {
TreeGenome::generate_grow(&mut rng, 5, 0.3)
};
population.push(tree);
}
// Evaluate initial population
let mut pop_with_fitness: Vec<(TreeGenome<ArithmeticTerminal, ArithmeticFunction>, f64)> =
population
.iter()
.map(|tree| (tree.clone(), fitness.evaluate(tree)))
.collect();
pop_with_fitness.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
println!("Initial population stats:");
println!(" Best fitness: {:.6}", pop_with_fitness[0].1);
println!(" Best expr: {}", pop_with_fitness[0].0.to_sexpr());
println!();
// Simple evolution loop
let max_generations = 50;
let tournament_size = 5;
let mutation_rate = 0.2;
let crossover_rate = 0.8;
for gen in 0..max_generations {
let mut new_pop = Vec::new();
// Elitism: keep best
new_pop.push(pop_with_fitness[0].0.clone());
while new_pop.len() < 100 {
// Tournament selection
let parent1 = tournament_select(&pop_with_fitness, tournament_size, &mut rng);
let parent2 = tournament_select(&pop_with_fitness, tournament_size, &mut rng);
let mut child = if rng.gen::<f64>() < crossover_rate {
// Subtree crossover
subtree_crossover(parent1, parent2, &mut rng)
} else {
parent1.clone()
};
// Point mutation
if rng.gen::<f64>() < mutation_rate {
point_mutate(&mut child, &mut rng);
}
// Limit tree depth
if child.depth() <= 10 {
new_pop.push(child);
} else {
new_pop.push(parent1.clone());
}
}
// Evaluate new population
pop_with_fitness = new_pop
.iter()
.map(|tree| (tree.clone(), fitness.evaluate(tree)))
.collect();
pop_with_fitness.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
if (gen + 1) % 10 == 0 {
println!(
"Gen {:3}: Best = {:.6}, Size = {}, Expr = {}",
gen + 1,
pop_with_fitness[0].1,
pop_with_fitness[0].0.size(),
truncate_expr(&pop_with_fitness[0].0.to_sexpr(), 50)
);
}
}
println!("\n=== Final Result ===");
let best = &pop_with_fitness[0];
println!("Best fitness: {:.6}", best.1);
println!("Best expression: {}", best.0.to_sexpr());
println!("Tree size: {} nodes", best.0.size());
println!("Tree depth: {}", best.0.depth());
// Test the discovered function
println!("\nComparison on test points:");
println!(
"{:>6} {:>12} {:>12} {:>12}",
"x", "Target", "Predicted", "Error"
);
for x in [-3.5, -1.0, 0.0, 1.0, 2.5] {
let target = x * x + 2.0 * x + 1.0;
let predicted = best.0.evaluate(&[x]);
let error = (target - predicted).abs();
println!(
"{:6.1} {:12.4} {:12.4} {:12.6}",
x, target, predicted, error
);
}
Ok(())
}
fn truncate_expr(s: &str, max_len: usize) -> String {
if s.len() <= max_len {
s.to_string()
} else {
format!("{}...", &s[..max_len - 3])
}
}
fn tournament_select<'a>(
pop: &'a [(TreeGenome<ArithmeticTerminal, ArithmeticFunction>, f64)],
size: usize,
rng: &mut StdRng,
) -> &'a TreeGenome<ArithmeticTerminal, ArithmeticFunction> {
use rand::seq::SliceRandom;
let contestants: Vec<_> = pop.choose_multiple(rng, size).collect();
let best = contestants
.iter()
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap())
.unwrap();
&best.0
}
fn subtree_crossover(
parent1: &TreeGenome<ArithmeticTerminal, ArithmeticFunction>,
parent2: &TreeGenome<ArithmeticTerminal, ArithmeticFunction>,
rng: &mut StdRng,
) -> TreeGenome<ArithmeticTerminal, ArithmeticFunction> {
let positions1 = parent1.root.positions();
let positions2 = parent2.root.positions();
if positions1.is_empty() || positions2.is_empty() {
return parent1.clone();
}
let pos1 = &positions1[rng.gen_range(0..positions1.len())];
let pos2 = &positions2[rng.gen_range(0..positions2.len())];
if let Some(subtree) = parent2.root.get_subtree(pos2) {
let mut new_root = parent1.root.clone();
new_root.replace_subtree(pos1, subtree.clone());
TreeGenome::new(new_root, parent1.max_depth.max(parent2.max_depth))
} else {
parent1.clone()
}
}
fn point_mutate(tree: &mut TreeGenome<ArithmeticTerminal, ArithmeticFunction>, rng: &mut StdRng) {
let positions = tree.root.positions();
if positions.is_empty() {
return;
}
let pos = &positions[rng.gen_range(0..positions.len())];
// Replace with random subtree
let new_subtree = if rng.gen::<bool>() {
TreeNode::terminal(ArithmeticTerminal::random(rng))
} else {
let func = ArithmeticFunction::random(rng);
let arity = func.arity();
let children: Vec<_> = (0..arity)
.map(|_| TreeNode::terminal(ArithmeticTerminal::random(rng)))
.collect();
TreeNode::function(func, children)
};
tree.root.replace_subtree(pos, new_subtree);
}
/// Fitness function for symbolic regression
struct SymbolicRegressionFitness {
data: Vec<(f64, f64)>,
}
impl SymbolicRegressionFitness {
fn new(data: Vec<(f64, f64)>) -> Self {
Self { data }
}
fn evaluate(&self, tree: &TreeGenome<ArithmeticTerminal, ArithmeticFunction>) -> f64 {
let mse: f64 = self
.data
.iter()
.map(|(x, y)| {
let predicted = tree.evaluate(&[*x]);
if predicted.is_finite() {
(y - predicted).powi(2)
} else {
1e6 // Penalty for invalid values
}
})
.sum::<f64>()
/ self.data.len() as f64;
// Negate MSE (we maximize fitness)
// Add small parsimony pressure to prefer simpler trees
let complexity_penalty = tree.size() as f64 * 0.001;
-mse - complexity_penalty
}
}
Source:
examples/symbolic_regression.rs
Running the Example
cargo run --example symbolic_regression
Key Components
Tree Genome
TreeGenome<ArithmeticTerminal, ArithmeticFunction>
A tree genome consists of:
- Terminals: Leaf nodes (variables like
X, constants like3.14) - Functions: Internal nodes (operations like
+,*,sin)
Example tree for x² + 1:
+
/ \
* 1
/ \
X X
Tree Generation
// Full method: all leaves at same depth
let tree = TreeGenome::generate_full(&mut rng, 3, 6);
// Grow method: variable depth
let tree = TreeGenome::generate_grow(&mut rng, 5, 0.3);
Full method:
- All branches have exactly the specified depth
- Creates bushy, complete trees
- Good initial diversity
Grow method:
- Terminates probabilistically
- Creates varied shapes
- More natural tree sizes
Fitness for Symbolic Regression
struct SymbolicRegressionFitness {
data: Vec<(f64, f64)>, // (x, y) pairs
}
impl SymbolicRegressionFitness {
fn evaluate(&self, tree: &TreeGenome<...>) -> f64 {
let mse: f64 = self.data.iter()
.map(|(x, y)| {
let predicted = tree.evaluate(&[*x]);
(y - predicted).powi(2)
})
.sum::<f64>() / self.data.len() as f64;
// Negate MSE (we maximize fitness)
// Add parsimony pressure for simpler trees
let complexity_penalty = tree.size() as f64 * 0.001;
-mse - complexity_penalty
}
}
Key elements:
- Mean Squared Error (MSE) measures fit
- Parsimony pressure discourages bloat
- Negated because GA maximizes
Genetic Operators for Trees
Subtree Crossover:
fn subtree_crossover(parent1: &Tree, parent2: &Tree, rng: &mut Rng) -> Tree {
// 1. Select random node in parent1
// 2. Select random node in parent2
// 3. Replace subtree at pos1 with subtree from pos2
}
Point Mutation:
fn point_mutate(tree: &mut Tree, rng: &mut Rng) {
// 1. Select random node
// 2. Replace with new random subtree
}
Understanding GP Output
Expression Representation
println!("Expression: {}", tree.to_sexpr());
S-expression format (Lisp-like):
(+ X 1)meansX + 1(* X X)meansX * X(+ (* X X) (+ (* 2 X) 1))meansX² + 2X + 1
Tree Metrics
println!("Tree size: {} nodes", tree.size());
println!("Tree depth: {}", tree.depth());
- Size: Total number of nodes
- Depth: Longest path from root to leaf
Watch for bloat - trees growing without fitness improvement.
Controlling Bloat
1. Parsimony Pressure
Penalize large trees in fitness:
let fitness = raw_fitness - tree.size() as f64 * penalty;
2. Depth Limits
Reject trees exceeding maximum depth:
if child.depth() <= 10 {
new_pop.push(child);
} else {
new_pop.push(parent.clone());
}
3. Tournament with Parsimony
Prefer smaller trees when fitness is similar:
fn tournament_with_parsimony(pop: &[Tree], k: usize) -> &Tree {
let contestants = sample(pop, k);
contestants.iter()
.max_by(|a, b| {
let fitness_cmp = a.fitness.cmp(&b.fitness);
if fitness_cmp == Equal {
b.size().cmp(&a.size()) // Prefer smaller
} else {
fitness_cmp
}
})
}
Available Functions and Terminals
Built-in Arithmetic
Terminals:
X: Input variableConstant(f64): Numeric constants
Functions:
Add: Binary additionSub: Binary subtractionMul: Binary multiplicationDiv: Protected division (returns 1 if divisor is 0)
Custom Function Sets
Define problem-specific primitives:
enum MyFunction {
Add, Sub, Mul, Div,
Sin, Cos, Exp, Log,
}
enum MyTerminal {
X,
Y, // Multiple variables
Constant(f64),
}
GP Parameters
| Parameter | Typical Range | Effect |
|---|---|---|
| Population | 100-1000 | More = better exploration |
| Tournament size | 3-7 | Selection pressure |
| Crossover rate | 0.7-0.9 | Combination vs mutation |
| Mutation rate | 0.1-0.3 | Exploration |
| Max depth | 6-17 | Complexity limit |
| Parsimony coefficient | 0.0001-0.01 | Bloat control |
Common Issues
Bloat
Symptoms: Trees grow huge with no fitness improvement
Solutions:
- Increase parsimony pressure
- Stricter depth limits
- Use tournament with size tie-breaker
Premature Convergence
Symptoms: Population becomes homogeneous early
Solutions:
- Larger population
- Higher mutation rate
- Lower selection pressure
Numeric Instability
Symptoms: NaN or Inf in evaluations
Solutions:
- Use protected division
- Clamp outputs to reasonable range
- Penalize extreme predictions
Exercises
- Different target: Try f(x) = sin(x) or f(x) = x³
- Multiple variables: Extend to f(x, y) = x² + y²
- More functions: Add
sin,cos,exp
Next Steps
- Interactive Evolution - Human-guided optimization
- Custom Genome Types - Create your own genomes