This problem requires me to find the frequency analysis of a .txt file.
This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?
f = open('cipher.txt', 'r')
word_count = []
for c in f:
word_count.append(c)
word_count.sort()
decoding = {}
for i in word_count:
decoding[i] = word_count.count(i)
for n in decoding:
print(decoding)
This outputs (as a short example, since the txt file is pretty long):
{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, ' gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,
It gives me the words, but how would I get the letters, such as how many "a"'s there are, or how many "b"'s there are?
CodePudding user response:
Counter
is quite a useful class native to Python, which can be used to solve your problem elegantly.
# count the letter freqency
from collections import Counter
with open('cipher.txt', 'r') as f:
s = f.read()
c = Counter(s) # the type of c is collection.Counter
# if you want dict as your output type
decoding = dict(c)
print(decoding)
If you put "every parting from you is like a little eternity" to your cipher.txt
, you'll get the following result with the code above:
{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}
However, if you want to implement the counting by yourself, here's a possible solution, providing the same result as using Counter
.
# count the letter freqency, manually, without using collections.Counter
with open('cipher.txt', 'r') as f:
s = f.read()
decoding = {}
for c in s:
if c in decoding:
decoding[c] = 1
else:
decoding[c] = 1
print(decoding)
CodePudding user response:
You can use a Counter from the collections standard library, it'll generate a dictionary of results:
from collections import Counter
s = """
This problem requires me to find the frequency analysis of a .txt file.
This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""
c = Counter(s)
print(c.most_common(5))
This will print:
[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]
EDIT: Without using a Counter
, we can use a dictionary and keep incrementing the count:
c = {}
for character in s:
try:
c[character] = 1
except KeyError:
c[character] = 1
print(c)
This will print:
{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}