Count Most Commonly Used Words In A Txt File
Solution 1:
Actually, I would recommend that you continue to use Counter
. It's a really useful tool for, well, counting things, but it has really expressive syntax, so you don't need to worry about sort
ing anything. Using it, you can do:
from collections import Counter
#opens the file. the with statement here will automatically close it afterwards.withopen("input.txt") as input_file:
#build a counter from each word in the file
count = Counter(word for line in input_file
for word in line.split())
print(count.most_common(10))
With my input.txt
, this has the output of
[('THE', 27643), ('AND', 26728), ('I', 20681), ('TO', 19198), ('OF', 18173), ('A', 14613), ('YOU', 13649), ('MY', 12480), ('THAT', 11121), ('IN', 10967)]
I've changed it a bit so it doesn't have to read the whole file into memory. My input.txt
is my punctuationless version of the works of shakespeare, to demonstrate that this code is fast. It takes about 0.2 seconds on my machine.
Your code was a bit haphazard - it looks like you've tried to bring together several approaches, keeping bits of each here and there. My code has been annotated with some explanatory functions. Hopefully it should be relatively straightforward, but if you're still confused about anything, let me know.
Solution 2:
You haven't pulled anything from the .txt file yet. What does the inside of the text file look like? If you want to classify words as groups of characters separated by spaces, you could get a list of the words with:
withopen('path/to/file.txt', 'r') as f:
words = ' '.split(f.read())
Then to get the 10 most common (there's probably more efficient ways but this is what I found first):
word_counter = {}
for word in words:
if word in word_counter:
word_counter[word] += 1else:
word_counter[word] = 1
popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
print popular_words[:10]
Post a Comment for "Count Most Commonly Used Words In A Txt File"