Home > Net >  How can I find out the alphabet and alphabetic transition frequency?
How can I find out the alphabet and alphabetic transition frequency?

Time:10-03

Example : 'peter piper picked a peck of pickled peppers'

first, The frequency of one alphabet was calculated.

abc = 'peter piper picked a peck of pickled peppers'
set(abc)
freq = {}
for i in abc:
    freq[i] = abc.count(i)
freq

But, I couldn't find the number of the alphabet that is converted as above, how can I get it?

  1. p>e : 5
  2. e>t : 1
  3. t>e : 1
  4. e>r : 3

CodePudding user response:

  1. use zip with the same string shifted by 1 to get pairs of current and the next character
  2. use collections.Counter to get counts. It could also simplify your code to a single line.
from collections import Counter
    
Counter(zip(abc, abc[1:]))

Returns:

Counter({('p', 'e'): 5,
         ('e', 't'): 1,
         ('t', 'e'): 1,
         ('e', 'r'): 3,
         ('r', ' '): 2,
         (' ', 'p'): 5,
         ('p', 'i'): 3,
         ('i', 'p'): 1,
         ('i', 'c'): 2,
         ('c', 'k'): 3,
         ('k', 'e'): 1,
         ('e', 'd'): 2,
         ('d', ' '): 2,
         (' ', 'a'): 1,
         ('a', ' '): 1,
         ('e', 'c'): 1,
         ('k', ' '): 1,
         (' ', 'o'): 1,
         ('o', 'f'): 1,
         ('f', ' '): 1,
         ('k', 'l'): 1,
         ('l', 'e'): 1,
         ('e', 'p'): 1,
         ('p', 'p'): 1,
         ('r', 's'): 1})

CodePudding user response:

collections.Counter is your friend:

from collections import Counter

sentence = 'peter piper picked a peck of pickled peppers'
letter_pairs = [sentence[i:i 2] for i in range(len(sentence) - 1)]
pair_freq = Counter(letter_pairs)

Gives

Counter({'pe': 5, ' p': 5, 'er': 3, 'pi': 3, 'ck': 3, 'r ': 2, 'ic': 2, 'ed': 2, 'd ': 2, 'et': 1, 'te': 1, 'ip': 1, 'ke': 1, ' a': 1, 'a ': 1, 'ec': 1, 'k ': 1, ' o': 1, 'of': 1, 'f ': 1, 'kl': 1, 'le': 1, 'ep': 1, 'pp': 1, 'rs': 1})

You can filter out pairs with spaces like this:

internal_pair_freq = {k: v for k, v in pair_freq.items() if ' ' not in k}

This also works for single letters, of course:

letter_freq = Counter(sentence)

Giving

Counter({'p': 9, 'e': 8, ' ': 7, 'r': 3, 'i': 3, 'c': 3, 'k': 3, 'd': 2, 't': 1, 'a': 1, 'o': 1, 'f': 1, 'l': 1, 's': 1})
  • Related