I am trying to understand how evolution and fitness function work. I implemented my own Knapsack problem, which tries to find 3 most valuable items from a given set. It is my own cost function:
static int counter=1;
static double MyCostFunction(boolean[] chromosome, double[] data){
System.out.println("COST FUNCTION: " counter );
double sum = 0;
int count = 0;
for (int i =0; i<chromosome.length; i ){
if (chromosome[i]){
sum = data[i];
count ;
}
}
return count<=3 ? sum : 0;
}
I’m printing out just to see how many times MyCostFunction is performed. Right now I am working with a small set of 8 items. Here it is:
public static void main(String[] args) {
double[] data = {-15,-7, -1, -1,7,100,200,300};
Integer[] items = new Integer[data.length];
int i = 0;
for (double d : data){
items[i] = i;
i ;
}
ISeq<Integer> zbiorDlaGA = ISeq.of(items);
final ISProblem knapsack= new ISProblem(zbiorDlaGA,
chromosome -> ISProblem.MyCostFunction( chromosome, data)
);
This is the Engine that I’m using:
final Engine<BitGene,Double> engine= Engine.builder(knapsack)
.executor( Runnable::run)
.populationSize(5)
.survivorsSelector(new TournamentSelector<>(3))
.offspringSelector(new RouletteWheelSelector<>())
.alterers(
new Mutator<>(1.0),
new SinglePointCrossover<>(0.0)
).build();
And that’s how I’m obtaining statistics:
final EvolutionStatistics<Double,?> statistics=EvolutionStatistics.ofNumber();
final Phenotype<BitGene,Double> best=engine.stream()
// .limit(bySteadyFitness(15))
.limit(5)
.peek(r-> System.out.println("########CURRENT GEN: "
r.generation()
": " r.totalGenerations()
": " r.bestPhenotype()
" ALTERED: " r.alterCount()
" INVALID: " r.invalidCount()
" GENOTYPE: " r.genotypes()))
.peek(statistics)
.collect(toBestPhenotype());
System.out.println(statistics);
System.out.println(best);
I've got a few questions concerning some behaviours of your library that I don't really understand: Even when I set mutation probability to 1.0 (it should change every bit, I think so) it gives me this result:
COST FUNCTION: 1
COST FUNCTION: 2
COST FUNCTION: 3
COST FUNCTION: 4
COST FUNCTION: 5
COST FUNCTION: 6
COST FUNCTION: 7
COST FUNCTION: 8
########CURRENT GEN: 1: 1: [01100000] -> 300.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10110100],[00011011],[01011101],[00111001],[01100000]]
COST FUNCTION: 9
COST FUNCTION: 10
COST FUNCTION: 11
########CURRENT GEN: 2: 2: [00011011] -> 0.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[00011011],[01011101],[10100101],[01111010],[00001011]]
COST FUNCTION: 12
COST FUNCTION: 13
COST FUNCTION: 14
########CURRENT GEN: 3: 3: [10100101] -> 0.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10100101],[01011101],[01111011],[01010011],[10010101]]
COST FUNCTION: 15
COST FUNCTION: 16
COST FUNCTION: 17
########CURRENT GEN: 4: 4: [10000010] -> 293.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[01011101],[01010011],[01111011],[10000010],[00001001]]
COST FUNCTION: 18
COST FUNCTION: 19
COST FUNCTION: 20
########CURRENT GEN: 5: 5: [10000010] -> 293.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10000010],[01011101],[01000100],[10111100],[10011001]]
---------------------------------------------------------------------------
| Time statistics |
---------------------------------------------------------------------------
| Selection: sum=0,003352000000 s; mean=0,000670400000 s |
| Altering: sum=0,010647600000 s; mean=0,002129520000 s |
| Fitness calculation: sum=0,011403300000 s; mean=0,002280660000 s |
| Overall execution: sum=0,033579600000 s; mean=0,006715920000 s |
---------------------------------------------------------------------------
| Evolution statistics |
---------------------------------------------------------------------------
| Generations: 5 |
| Altered: sum=120; mean=24,000000000 |
| Killed: sum=0; mean=0,000000000 |
| Invalids: sum=0; mean=0,000000000 |
---------------------------------------------------------------------------
| Population statistics |
---------------------------------------------------------------------------
| Age: max=4; mean=0,560000; var=1,090000 |
| Fitness: |
| min = -23,000000000000 |
| max = 300,000000000000 |
| mean = 41,840000000000 |
| var = 10763,306666666665 |
| std = 103,746357365773 |
---------------------------------------------------------------------------
[01100000] -> 300.0
Why does it calculate fitness for three individuals only? Is it somehow caching the values already calculated for the same chromosome?
And why in the first generation it is calculated 8 times?
The number of altered individuals is always 3*8, what implies that only 3 chromosomes were modified during the evolution. Why? I thought it has something to do with TournamentSelector sampleSize, but it doesn’t change number of changed chromosomes.
With the probability of mutation equal to 100% and crossover prob.=100% it should change every bit in chromosome, so there will only be 2 versions of chromosome in each generation, but it doesn’t work like that. Why? Does it randomly select value of bit or sets the opposite?
Does the number of bits set to true have to be a constant one(or approximately constant) in population/generation?
I am using these strange values for crossover and mutation probabilities, because I was wondering earlier why it didn’t give the number of fitness calculations performed equal to populationSize*NumberOfGenerations. So I started to experiment.
CodePudding user response:
The described behavior is as expected :)
The following facts leads to the observed outcome:
- During the evolution process, the population is divided into offspring and survivors.
- Only the offspring is target for the alterers (mutation operation).
- The default offspring fraction is set to 0.6, which makes an offspring count of 3 for your example.
- The fitness values of unchanged individuals is cached.
The outcome:
- The reason for the first five fitness evaluations is induced by the fresh, initial population of 5.
- The next three evaluations are due to the three offspring individuals, which has been mutated with 100% certainty.
- The subsequent evaluation blocks of three evaluations have the same reason.
I think this explains the output, doesn't it?