找到你要的答案

Q:easy genetic algorithm fitness function

Q:简单遗传算法适应度函数

I'm trying to do a very easy genetic algorithm in c (for school research project). I am kind of stuck on calculating the fitness percentage.

I'm trying to match a random string from user input, with a dictionary word. (one could imagine a scrabble game algorithm or anything else)

For instance when the user input is "hello" and the dictionary word "hello", both strings match and a fitness of 100% should be correct. With "hellp" and "hello" a fitness of almost 100% and with "uryyb" fitness should be (far) below 100%.

Perhaps does anybody know how to do fitness function or know (general) reference of this sort of fitness functions?


Here I allocate memory for an array of dictionary words

int row;

//alloceer eerst amount_words void *
woorden = (char **) malloc( amount_words * (len + 1) );

for( row = 0; row <= amount_words; row++ ) 
woorden[row] = (char *) malloc ( len + 1 );

return;

these could also be freed:

int row;

for( row = 0; row <= amount_words; row++ ) 
free( woorden[row] );

free( woorden );

return;

Here I open a dictionary file.

FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
    amount_words++;
    if( !is_valid_str( woord )  )
    amount_words--;
}
}
fclose(f);

return amount_words;

I rudely strip characters:

char is_valid_str( char *str  )
{ 
    int i;
    for( i=0; i <= zoek_str_len - 1; i++ ) 
    if( str[i] < 'a' || str[i] > 'z' )
        return FALSE;
    return TRUE;
}

I calculate the amount of words of certain length

amount_len_words( int len )
{
FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
    amount_words++;
    if( !is_valid_str( woord )  )
    amount_words--;
}
}
fclose(f);

return amount_words;

}

I read an array of words, certain length

FILE *f;
int i=0;
int lenwords;
char woord[40];

lenwords = amount_len_words( len );
alloc_woorden( lenwords, len );

f = fopen("words.txt", "r");
while( !feof( f ) ) {
    fscanf(f,"%s\n", woord );
    if( strlen(woord) == len ) {
    if( is_valid_str( woord ) ) {
    strncpy(woorden[i++], woord, len); 
    //printf("->%s\n", woorden[i]);
    }
}
}

for( i=0;i < lenwords;i++) {
printf("%s\n", woorden[i] );
}

Here is the main routine

int i;
char zoek_str[40];

if( argc <= 1 ) {
printf( "gebruik: %s zoek_string\n", argv[0] );
return 0;
}

if( strlen( argv[1] ) > 39 ) {
printf( "Zoek string maximaal 39 lowercase karakters.\n" );
return 0;
}

strcpy( zoek_str, argv[1] );
zoek_str_len = strlen ( zoek_str );

if( !is_valid_str( zoek_str ) ) {
printf( "Ongeldige zoek string. Neemt alleen lowercase karakters!\n" ); 
return 0;
}

printf("%s\n",zoek_str);

init_words( zoek_str_len );

return 0;
}

These two are the functions I'm currently puzzling about:

double calculate_fitness( char *zoek )
{

}

And

void mutate( char *arg )
{

}

Thereafter I would calculate generation by generation.

Note that I only search at fixed length strings ex: strlen(argv[1])

example output of all of this could be:

generation    string     word      percentage

1             hfllr      hello     89.4%

2             hellq      hello     90.3%

3             hellp      hello     95.3%

4             hello      hello     100%

or something like that.

我试图在C(学校研究项目)做一个非常简单的遗传算法。我有点困在计算健身比例。

我试图从用户输入匹配随机字符串,用字典词。(你可以想象一个拼字游戏算法或者别的什么东西)

For instance when the user input is "hello" and the dictionary word "hello", both strings match and a fitness of 100% should be correct. With "hellp" and "hello" a fitness of almost 100% and with "uryyb" fitness should be (far) below 100%.

也许有人知道如何做健身功能或知道(一般)参考这种健身功能?


这里我为字典单词的数组分配内存

int row;

//alloceer eerst amount_words void *
woorden = (char **) malloc( amount_words * (len + 1) );

for( row = 0; row <= amount_words; row++ ) 
woorden[row] = (char *) malloc ( len + 1 );

return;

这些也可以被释放:

int row;

for( row = 0; row <= amount_words; row++ ) 
free( woorden[row] );

free( woorden );

return;

这里我打开一个字典文件。

FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
    amount_words++;
    if( !is_valid_str( woord )  )
    amount_words--;
}
}
fclose(f);

return amount_words;

我粗暴的漫画人物:

char is_valid_str( char *str  )
{ 
    int i;
    for( i=0; i <= zoek_str_len - 1; i++ ) 
    if( str[i] < 'a' || str[i] > 'z' )
        return FALSE;
    return TRUE;
}

我计算一定长度的单词的数量

amount_len_words( int len )
{
FILE *f;
int amount_words = 0;
char woord[40];

f = fopen("words.txt", "r");

while(!feof(f)) {
fscanf( f, "%s\n", woord );
if( strlen(woord) == len ) {
    amount_words++;
    if( !is_valid_str( woord )  )
    amount_words--;
}
}
fclose(f);

return amount_words;

}

我读了一系列单词,一定长度

FILE *f;
int i=0;
int lenwords;
char woord[40];

lenwords = amount_len_words( len );
alloc_woorden( lenwords, len );

f = fopen("words.txt", "r");
while( !feof( f ) ) {
    fscanf(f,"%s\n", woord );
    if( strlen(woord) == len ) {
    if( is_valid_str( woord ) ) {
    strncpy(woorden[i++], woord, len); 
    //printf("->%s\n", woorden[i]);
    }
}
}

for( i=0;i < lenwords;i++) {
printf("%s\n", woorden[i] );
}

这里是主要例程

int i;
char zoek_str[40];

if( argc <= 1 ) {
printf( "gebruik: %s zoek_string\n", argv[0] );
return 0;
}

if( strlen( argv[1] ) > 39 ) {
printf( "Zoek string maximaal 39 lowercase karakters.\n" );
return 0;
}

strcpy( zoek_str, argv[1] );
zoek_str_len = strlen ( zoek_str );

if( !is_valid_str( zoek_str ) ) {
printf( "Ongeldige zoek string. Neemt alleen lowercase karakters!\n" ); 
return 0;
}

printf("%s\n",zoek_str);

init_words( zoek_str_len );

return 0;
}

这两个是我目前困惑的功能:

double calculate_fitness( char *zoek )
{

}

void mutate( char *arg )
{

}

此后我将逐代计算。

请注意,我只搜索在固定长度的字符串,如:strlen(argv [ 1 ])

所有这些的例子输出可能是:

generation    string     word      percentage

1             hfllr      hello     89.4%

2             hellq      hello     90.3%

3             hellp      hello     95.3%

4             hello      hello     100%

或类似的东西。

answer1: 回答1:

By comparing the two strings letter by letter a metric could be correct/max_length where 'correct' is the number of letters that match and 'max_length' is the length of the longest string.

For something more involved you could look up the concept of Edit distance.

通过比较两个字符串字母度量可以正确max_length在“正确”的是字母匹配和max_length '是最长的字符串的长度的数量。

对于更多的东西,你可以查找编辑距离的概念。

answer2: 回答2:

see Edit distance

also see Levenshtein distance

Basically what you are trying to measure is the minimum number of operations required to transform one string into the other.

看到编辑距离

Levenshtein也看到距离

基本上你试图测量的是将一个字符串转换成另一个字符串所需的最小操作数。

answer3: 回答3:

First of all, you need a metric on "distance between strings". A commonly used one is the Levenshtein distance, which measure the distance between two strings as the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one string into the other.

Googling this you can find multiple code examples on how to compute such distance. Once you have the distance, your fitness should be inversely proportional to it.

首先,你需要一个度量“字符串之间的距离”。一个常用的是Levenshtein距离,测量距离之间的两个字符串作为单个字符的最小数目的编辑(如插入,删除或替换)需要改变到另一字符串。

搜索这个你可以找到多个代码示例如何计算这样的距离。一旦你有距离,你的健身应该是成反比的。

algorithm