I have a python dictionary and I want to count the amount of keys that have a specific format. The keys that I want to count are all the keys that have the format ‘letter, number, number’. In my specific case the key always begins with the letter ‘A’. only the numbers change. Example: A12, A16, A71
For example I want to count all the entries that have this AXX format (where the X’s are numbers).
{'A34': 83, 'B32': 70, 'A44': 66, A12: 47, 'B90': 71}
I know I can count all the entries of my dictionary by using:
print(len(my_dict.keys()))
but how do I count up all the entries that have the specific format I need.
CodePudding user response:
You can use a generator comprehension inside the sum
function:
print(sum(1 for k in d.keys() if k.startswith('A') and len(k) == 3 and k[1:3].isdigit()))
This does three checks: if the key starts with A, if the length of this key is 3 and if the last two characters of this key is a digit.
You can also use Regex:
import re
print(sum(1 for k in d.keys() if re.match('^A\\d{2}$', k)))
Both snippets outputs 3.
CodePudding user response:
import re
my_dict = { ... }
filtered = filter(lambda k: bool(re.match("^[A-Z][0-9]{2}", k)), my_dict.keys())
print(len(filtered))
CodePudding user response:
You can try list comprehension.
len([key for key in list(my_dict.keys()) if 'A' in key])
For your specific condition, we can try the below, if you need to be more specific then write a regex in the if clause.
len([key for key in list(my_dict.keys()) if ((key.startswith('A')) and (len(key)==3))])
Should work!
CodePudding user response:
Go through all possibilities and check?
result = sum(f'A{i:02}' in my_dict for i in range(100))
Benchmark along with the solutions from the accepted answer, with a dict like you described your real one ("about 5000 items" and "all my A values have 2 digits. However other values that I have such as B and C values will have 3 digits."):
41.5 μs sum(f'A{i:02}' in my_dict for i in range(100))
573.5 μs sum(1 for k in my_dict.keys() if k.startswith('A') and len(k) == 3 and k[1:3].isdigit())
3546.0 μs sum(1 for k in my_dict.keys() if re.match('^A\d{2}$', k))
Benchmark code (Try it online!):
from timeit import repeat
setup = '''
from random import sample
from string import ascii_uppercase as letters
import re
A = [f'A{i:02}' for i in range(100)]
B2Z = [f'{letter}{i}' for letter in letters for i in range(10, 1000)]
A2Z = sample(A sample(B2Z, 4900), 5000)
my_dict = dict.fromkeys(A2Z)
'''
E = [
"sum(f'A{i:02}' in my_dict for i in range(100))",
"sum(1 for k in my_dict.keys() if k.startswith('A') and len(k) == 3 and k[1:3].isdigit())",
"sum(1 for k in my_dict.keys() if re.match('^A\\d{2}$', k))",
]
for _ in range(3):
for e in E:
number = 10
t = min(repeat(e, setup, number=number)) / number
print('%6.1f μs ' % (t * 1e6), e)
print()