I would like a regex that returns all of the A's and all of the B's in a string like 'AABCTA' with similar characters grouped together in the result i.e. [AAA, BB]. Note that there may be thousands of repeated letters.
This works for just the A's:
re.findall(r'[A]','AABCTBA')
['A', 'A', 'A']
but this:
re.findall(r'[A B]','AABCTBA')
returns:
['A', 'A', 'B', 'A']
I want ['A' 'A' 'A', 'B']
.
What I ultimately want is a count of the letters so if there is a different way to get the letter count using regex I'd love to see that.
CodePudding user response:
as you want to count multiple things at once then use Counter
and you can feed it any iterable, being the string directly of what you get from your regex if it is some more complex thing that is easier to get with it
>>> import collections
>>> res=collections.Counter('AABCTBA')
>>> res
Counter({'A': 3, 'B': 2, 'C': 1, 'T': 1})
>>> res["A"]
3
>>> res["B"]
2
>>>
CodePudding user response:
import re
text = "AABCTA"
for char in ('A', 'B'):
print(re.findall(char, text))
# output:
# ['A', 'A', 'A']
# ['B']
If you looking for oneliner way:
>>> print(*[re.findall(char, text) for char in ('A', 'B')], sep='\n')