找到你要的答案

Q:Segmentation fault error in a program for counting no of occurences of a word in a file using threads

Q:在一个没有对文件使用线程一词出现程序分段错误

So i have the following problem: Implement a program that gets as arguments a file name followed by words. For each word, create a separate thread that counts its appearances in the given file.Print out the sum of the appearances of all words.

my code is:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

When I try to run it I get the segmentation fault error. Why is that? Thank you!

因此,我有以下问题:实现一个程序,得到作为参数的文件名后跟单词。对于每一个单词,创建一个单独的线程,计算它在给定文件中的外观,打印出所有单词外观的总和。

我的代码是:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

When I try to run it I get the segmentation fault error. Why is that? Thank you!

answer1: 回答1:

First off, your code is terribly formatted. It's not even consistent. It also does not appear you are compiling with warnings enabled.

If you are a university course and they did not tell you how do format the code and compile with warnings, I strongly suggest you ask your tutors what gives.

If using gcc, add -Wall -Wextra. For coding style, I recommend stealing one either from Linux or FreeBSD. There are various editors which format the code for you, including real editors like vim (which is worth trying out even though it may look harsh).

Your coding style helps you screw yourself over.

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf is unused, which you would learn if you had warnings enabled. read is a bad name.

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

Because you don't return (as you should have) you are setting yourself up for a screwup where you execute code you should not. And guess not, your 'else' clause is deffective. You are missing curly braces, so teh for loop below is executed even if file open operation failed.

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10? Seems like a typo as you likely meant 100. This would not have happened if you used sizeof.

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

It is unclear what are you doing here. It seems you wanted to do strcmp starting from read[0], read1 and so on. But you read new data which replace stuff in the original buffer and then you advance it by one. This makes no sense whatsoever. Finally, you are doing it wrong anyway. read[j] does not evaluate to an address and once more the compiler would have told you that if you asked it to.

strcmp approach is very bad anyway. Try an approach where you try to match the first character and work from there.

int main(int argc, char* argv[]) {

Standard misplaced '*'. Use char *argv[] instead.

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

Highly not recommended. Validate arguments first and then have a dedicated variable which holds an amount of threads. At this point you can allocate an array.

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

Similarly to threads, save the path somewhere. Referring to it as argv1 is bad style which will come back to bite you. Using argv for words is fine.

However, this is wrong in general. You setup a local struct and pass it to a thread, then immediately change it. So what happens is at the end of the day all your threads are counting the same word. But the word they were counting changed along the way.

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

Go figure. You did not have a variable which held an amount of threads and this lead to this inconsistency (argc - 1 vs argc - 2).

In general this issue could have been solved for the most part with proper reading of compiler warnings, and avoided in general if basic good practices were employed.

Of course bugs happen regardless of that and in such cases you can at the very least narrow them down.

Finally, few words about the general approach. It is unclear what was the point of the exercise. You actually have to force yourself to use anything more than pthread_create and pthread_join. Let's assume the only requirement is to use threads.

I don't know if they force you to open the file multiple times or what. Opening and reading stuff multiple times is not only wasteful but opens you up for a situation where the file is replaced and some threads open a different one.

An OK solution would open the file once in main. Once open, you would mmap the file and fstat for size. If for some reason you can't use mmap, you would malloc a large enough buffer and read the file.

Then all threads can get an address of that buffer, the word to look for and an address to which they should store the counter (each thread gets a different address).

When all threads exited you loop over to sum the result.

Either way no locking is involved.

首先,您的代码格式非常糟糕。它甚至不一致。它也不会出现您正在编译的警告启用。

如果你是一个大学课程,他们没有告诉你如何格式化代码和编译警告,我强烈建议你问你的导师什么给。

如果使用GCC,增加墙体的wextra。编码风格,我推荐偷一无论是从Linux或FreeBSD。有各种各样的编辑格式代码给你,包括真正的编辑像Vim(这是值得尝试,尽管它可能看起来恶劣)。

您的编码风格帮助您拧自己。

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

但闲置,你会知道如果你有警告功能。读是个坏名字。

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

因为你不回(你应该)你是为自己设定一个错误在你执行代码,你不应该。你猜,你的“其他”条款有效。你错过了花括号,那么该环下面是即使文件打开操作失败执行。

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10?似乎是一个错字,你可能是100。这不会发生如果你使用sizeof。

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

目前还不清楚你在这里做什么。看来你想做strcmp从阅读[ 0 ],read1等等。但您读取新的数据,取代原来的缓冲区的东西,然后你提前一个。这没有任何意义。最后,你做错了。不要对一个地址进行评估,如果你要求它,编译器会告诉你的。

strcmp做法很糟糕吧。尝试一种方法,你尝试匹配的第一个字符,并从那里工作。

int main(int argc, char* argv[]) {

标准错位' * '。使用char argv [ ]代替。

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

高度不推荐。首先验证参数,然后有一个专用的变量,其中持有线程量。此时可以分配数组。

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

类似于线程,在某处保存路径。指的是它argv1是不良作风会回来咬你。使用argv的话是好的。

然而,这是错误的一般。你设置一个本地的结构并将它传递给一个线程,然后立即改变它。所以发生的是在一天结束的时候你所有的线程都在计算同一个词。但他们所计数的字改变了前进的道路。

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

去图。你没有变,举行线程的金额以及导致这种不一致(argc 1 vs argc - 2)。

在一般情况下,这个问题可能已经解决了大部分的编译器警告的正确阅读,并避免一般,如果基本的良好做法。

当然,错误的发生,无论在这种情况下,你至少可以缩小他们下来。

最后,关于一般方法的几句话。目前还不清楚演习的重点是什么。你必须强迫自己使用任何超过pthread_create和pthread_join。让我们假设唯一的要求是使用线程。

我不知道他们是否强迫你打开文件多次或什么。多次打开和读取文件不仅浪费,而且会为文件替换和一些线程打开不同的文件打开您的位置。

一个确定的解决方案将打开文件在主。一旦打开,你将大小文件和fstat mmap。如果因为某些原因你不能使用mmap,你会分配一个足够大的缓冲和读取文件。

然后所有线程都可以得到该缓冲区的地址、要查找的字以及存储计数器的地址(每个线程都有不同的地址)。

当所有线程退出时,循环返回结果。

任何方式都不涉及锁定。

answer2: 回答2:

In main() your two i loops are different

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

and then

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

and in this second loop you are referencing threads[argc-2] which was not created in the first loop.

在你我的循环是不同的两main()

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

然后

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

在这第二环路你引用线程[ argc-2 ]这不是第一环的创建。

answer3: 回答3:

reading (upto) 10 characters at a time (probably) will miss some instances of the word being search for.

the strcmp() always starts at the beginning of the 10 characters.

1) need to look for the target word at any location in the file.

2) need to look for the target word at any location in the read-in buffer.

Suggest:

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0 

阅读(到)10个字符一次(可能)会错过一些词被搜索的实例。

的strcmp()总是从10个字符开始。

1)需要在文件中的任何位置查找目标词。

2)需要在读取缓冲区的任何位置寻找目标词。

建议:

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0 
c  linux  multithreading  file  pthreads