How to perform letter frequency?-CodePudding

This problem requires me to find the frequency analysis of a .txt file.

This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?

f = open('cipher.txt', 'r')
word_count = []

for c in f:
  word_count.append(c)


word_count.sort()

decoding = {}

for i in word_count:
  decoding[i] = word_count.count(i)

for n in decoding:
  print(decoding)

This outputs (as a short example, since the txt file is pretty long):

{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, '    gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,

It gives me the words, but how would I get the letters, such as how many "a"'s there are, or how many "b"'s there are?

CodePudding user response：

Counter is quite a useful class native to Python, which can be used to solve your problem elegantly.

# count the letter freqency
from collections import Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

c = Counter(s)  # the type of c is collection.Counter

# if you want dict as your output type
decoding = dict(c)
print(decoding)

If you put "every parting from you is like a little eternity" to your cipher.txt, you'll get the following result with the code above:

{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}

However, if you want to implement the counting by yourself, here's a possible solution, providing the same result as using Counter.

# count the letter freqency, manually, without using collections.Counter

with open('cipher.txt', 'r') as f:
    s = f.read()

decoding = {}
for c in s:
    if c in decoding:
        decoding[c]  = 1
    else:
        decoding[c] = 1

print(decoding)

CodePudding user response：

You can use a Counter from the collections standard library, it'll generate a dictionary of results:

from collections import Counter
s = """

This problem requires me to find the frequency analysis of a .txt file.

This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""

c = Counter(s)
print(c.most_common(5))

This will print:

[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]

EDIT: Without using a Counter, we can use a dictionary and keep incrementing the count:

c = {}
for character in s:
    try:
        c[character]  = 1
    except KeyError:
        c[character] = 1
print(c)

This will print:

{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}