Python frequency table, exclude characters-CodePudding

Good evening,

I'm wondering how i can exclude certain characters from a frequency table?

first i read the file, creates a frequency table. after this i change it to a tuple to be able to get a percentage of occourence for each letter. however i am wondering how i can implement that it does not count certain letters. ie. an exclude list.

with open('test.txt', 'r') as file:
data = file.read().replace('\n', '')



frequency_table = {char : data.count(char) for char in set(data)} 


x0= ("Character frequency table for '{}' is :\n {}".format(data, str(frequency_table)))


from collections import Counter
res = [(*key, val) for key, val in Counter(frequency_table).most_common()]
print("Frequency Tuple list : "   str(res))
print(res[1][1]/res[1][1])#

CodePudding user response：

Sounds like you need an if at the end of your dictionary comprension:

frequency_table = {char : data.count(char) for char in set(data) if char not in EXCLUDE}

You can then set your EXCLUDE as, for example:

a list, i.e. ['a', 'b', 'c', 'd'] or list('abcd')
or, you can just use a string of characters directly, such as 'abcd'.

>>> data = 'aaaabcdefededefefedf'
>>> EXCLUDE_LIST = 'a'
>>> frequency_table = {char : data.count(char) for char in set(data) if char not in EXCLUDE_LIST}
>>> frequency_table
{'b': 1, 'c': 1, 'e': 6, 'f': 4, 'd': 4}

CodePudding user response：

Add an if to your comprehension:

exclude = {'a', 'r'}
frequency_table = {char: data.count(char) for char in set(data) if char not in exclude}

Alternatively, use the difference between two sets:

exclude = {'a', 'r'}
frequency_table = {char: data.count(char) for char in set(data) - exclude}