Home > Net >  Creating a histogram to map character frequency
Creating a histogram to map character frequency

Time:10-14

I am creating a function that returns a histogram with each letter of the alphabet and asterisks that map out how many times each character appears in a string. So far I have:

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

def character_frequency_string(text):
    #remove_extraneous function removes anything that is not a letter in the alphabet from the text string
    new_text = remove_extraneous(text)
    
    for char in new_text:
        if char in new_text:
            print(char  ' '   '*'*new_text.count(char))
        if char not in new_text:
            print(char)

My docstring is the following (with the outputs as they are right now, incorrect):

'''
    Examples:
    >>> character_frequency_string('hello world!')
    h *
    e *
    l ***
    l ***
    o **
    w *
    o **
    r *
    l ***
    d *
    >>> character_frequency_string('testing!')
    t **
    e *
    s *
    t **
    i *
    n *
    g *
    '''

The correct output for 'hello world!' would be:

enter image description here

How could I change my code so that the histogram works as intended (with all of the alphabet in order, displaying an asterisk beside each letter for its character frequency, still displaying letters when they aren't in the text, just with no asterisk.)

CodePudding user response:

Iterate over alphabet:

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
            'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


def character_frequency_string(text):
    new_text = text
    for char in alphabet:
        print(char   ' '   '*' * new_text.count(char))


character_frequency_string('hello world!')

Output

a 
b 
c 
d *
e *
f 
g 
h *
i 
j 
k 
l ***
m 
n 
o **
p 
q 
r *
s 
t 
u 
v 
w *
x 
y 
z 

The above solution has O(n^2) time complexity, a more performant alternative is to use collections.Counter.

CodePudding user response:

You could do the following, using a collections.Counter and f-strings:

from collections import Counter
from string import ascii_lowercase as alphabet

def character_frequency_string(text):
    c = Counter(text.lower())
    for x in alphabet:
        print(f"{x} {'*' * c[x]}")

>>> character_frequency_string("hello world!")
a 
b 
c 
d *
e *
f 
g 
h *
i 
j 
k 
l ***
m 
n 
o **
p 
q 
r *
s 
t 
u 
v 
w *
x 
y 
z 

Some docs:

  • Related