# Q：简单遗传算法适应度函数

I'm trying to do a very easy genetic algorithm in c (for school research project). I am kind of stuck on calculating the fitness percentage.

I'm trying to match a random string from user input, with a dictionary word. (one could imagine a scrabble game algorithm or anything else)

For instance when the user input is "hello" and the dictionary word "hello", both strings match and a fitness of 100% should be correct. With "hellp" and "hello" a fitness of almost 100% and with "uryyb" fitness should be (far) below 100%.

Perhaps does anybody know how to do fitness function or know (general) reference of this sort of fitness functions?

Here I allocate memory for an array of dictionary words

``````int row;

//alloceer eerst amount_words void *
woorden = (char **) malloc( amount_words * (len + 1) );

for( row = 0; row <= amount_words; row++ )
woorden[row] = (char *) malloc ( len + 1 );

return;
``````

these could also be freed:

``````int row;

for( row = 0; row <= amount_words; row++ )
free( woorden[row] );

free( woorden );

return;
``````

Here I open a dictionary file.

``````FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
amount_words++;
if( !is_valid_str( woord )  )
amount_words--;
}
}
fclose(f);

return amount_words;
``````

I rudely strip characters:

``````char is_valid_str( char *str  )
{
int i;
for( i=0; i <= zoek_str_len - 1; i++ )
if( str[i] < 'a' || str[i] > 'z' )
return FALSE;
return TRUE;
}
``````

I calculate the amount of words of certain length

``````amount_len_words( int len )
{
FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
amount_words++;
if( !is_valid_str( woord )  )
amount_words--;
}
}
fclose(f);

return amount_words;

}
``````

I read an array of words, certain length

``````FILE *f;
int i=0;
int lenwords;
char woord[40];

lenwords = amount_len_words( len );
alloc_woorden( lenwords, len );

f = fopen("words.txt", "r");
while( !feof( f ) ) {
fscanf(f,"%s\n", woord );
if( strlen(woord) == len ) {
if( is_valid_str( woord ) ) {
strncpy(woorden[i++], woord, len);
//printf("->%s\n", woorden[i]);
}
}
}

for( i=0;i < lenwords;i++) {
printf("%s\n", woorden[i] );
}
``````

Here is the main routine

``````int i;
char zoek_str[40];

if( argc <= 1 ) {
printf( "gebruik: %s zoek_string\n", argv[0] );
return 0;
}

if( strlen( argv[1] ) > 39 ) {
printf( "Zoek string maximaal 39 lowercase karakters.\n" );
return 0;
}

strcpy( zoek_str, argv[1] );
zoek_str_len = strlen ( zoek_str );

if( !is_valid_str( zoek_str ) ) {
printf( "Ongeldige zoek string. Neemt alleen lowercase karakters!\n" );
return 0;
}

printf("%s\n",zoek_str);

init_words( zoek_str_len );

return 0;
}
``````

These two are the functions I'm currently puzzling about:

``````double calculate_fitness( char *zoek )
{

}
``````

And

``````void mutate( char *arg )
{

}
``````

Thereafter I would calculate generation by generation.

Note that I only search at fixed length strings ex: strlen(argv[1])

example output of all of this could be:

``````generation    string     word      percentage

1             hfllr      hello     89.4%

2             hellq      hello     90.3%

3             hellp      hello     95.3%

4             hello      hello     100%
``````

or something like that.

For instance when the user input is "hello" and the dictionary word "hello", both strings match and a fitness of 100% should be correct. With "hellp" and "hello" a fitness of almost 100% and with "uryyb" fitness should be (far) below 100%.

``````int row;

//alloceer eerst amount_words void *
woorden = (char **) malloc( amount_words * (len + 1) );

for( row = 0; row <= amount_words; row++ )
woorden[row] = (char *) malloc ( len + 1 );

return;
``````

``````int row;

for( row = 0; row <= amount_words; row++ )
free( woorden[row] );

free( woorden );

return;
``````

``````FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
amount_words++;
if( !is_valid_str( woord )  )
amount_words--;
}
}
fclose(f);

return amount_words;
``````

``````char is_valid_str( char *str  )
{
int i;
for( i=0; i <= zoek_str_len - 1; i++ )
if( str[i] < 'a' || str[i] > 'z' )
return FALSE;
return TRUE;
}
``````

``````amount_len_words( int len )
{
FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
amount_words++;
if( !is_valid_str( woord )  )
amount_words--;
}
}
fclose(f);

return amount_words;

}
``````

``````FILE *f;
int i=0;
int lenwords;
char woord[40];

lenwords = amount_len_words( len );
alloc_woorden( lenwords, len );

f = fopen("words.txt", "r");
while( !feof( f ) ) {
fscanf(f,"%s\n", woord );
if( strlen(woord) == len ) {
if( is_valid_str( woord ) ) {
strncpy(woorden[i++], woord, len);
//printf("->%s\n", woorden[i]);
}
}
}

for( i=0;i < lenwords;i++) {
printf("%s\n", woorden[i] );
}
``````

``````int i;
char zoek_str[40];

if( argc <= 1 ) {
printf( "gebruik: %s zoek_string\n", argv[0] );
return 0;
}

if( strlen( argv[1] ) > 39 ) {
printf( "Zoek string maximaal 39 lowercase karakters.\n" );
return 0;
}

strcpy( zoek_str, argv[1] );
zoek_str_len = strlen ( zoek_str );

if( !is_valid_str( zoek_str ) ) {
printf( "Ongeldige zoek string. Neemt alleen lowercase karakters!\n" );
return 0;
}

printf("%s\n",zoek_str);

init_words( zoek_str_len );

return 0;
}
``````

``````double calculate_fitness( char *zoek )
{

}
``````

``````void mutate( char *arg )
{

}
``````

``````generation    string     word      percentage

1             hfllr      hello     89.4%

2             hellq      hello     90.3%

3             hellp      hello     95.3%

4             hello      hello     100%
``````

By comparing the two strings letter by letter a metric could be correct/max_length where 'correct' is the number of letters that match and 'max_length' is the length of the longest string.

For something more involved you could look up the concept of Edit distance.

see Edit distance

also see Levenshtein distance

Basically what you are trying to measure is the minimum number of operations required to transform one string into the other.

Levenshtein也看到距离