找到你要的答案

Q:Read words from file into dictionary

Q:从文件中读出单词到字典

so in our assignment my professor would like us to read in a text file line by line, then word by word, then create a dictionary counting the frequency of each word appearing. Here's what I have for now:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

What happens is that my dictionary tells me how many of each word appears in a particular line, leaving me with mostly 1s when some words show up in the entire text many times. How can I get my dictionary to count words from the entire text, not just a line?

因此,在我们的作业中,我的教授希望我们在文本文件中一行一行地逐字阅读,然后创建一个字典,计算每个单词出现的频率。这就是我现在所拥有的:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

会发生什么事,我的字典告诉我多少每个单词出现在一个特定的路线,给我多一个当一些词出现在整个文本多次。我怎样才能让我的字典从整个文本中计算单词,而不仅仅是一行?

answer1: 回答1:

The problem is you are resetting it every time, the fix is quite simple:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount

问题是你每次都重置它,修复很简单:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount
answer2: 回答2:

The problem is in this line:

wordcount[word] = line.count(word)

Every time that line executes, whatever the value of wordcount[word] was is getting replaced by line.count(word) when you want it to be added. Try changing it to:

wordcount[word] = wordcount[word] + line.count(word)

问题就在这一行:

wordcount[word] = line.count(word)

每一次行执行,不管执行[字]的价值越来越被线数(字)当你想要添加。尝试改变它:

wordcount[word] = wordcount[word] + line.count(word)
answer3: 回答3:

This is how I would do it:

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

Good luck!

我就是这样做的:

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

祝你好运!

answer4: 回答4:

In case you want to see another way to do this. It's not exactly line by line and word by word as you have requested, but you should be aware of the collections module which could be very useful sometimes.

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)

如果你想看到另一种方法来做到这一点。这不是一行一行,逐字按你的要求,但你应该知道的集合模块,这可能是非常有用的,有时。

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)
python  dictionary