I need to find all 10 digit numbers in the text starting with a certain number series. There is a example:
a_string = "Some text 6401104219 and 6401104202 and 2201104202"
matches = ["240", "880", "898", "910", "920", "960", "209", "309", "409", "471", "640"]
result is: 6401104219, 6401104202
CodePudding user response:
You can use regular expressions and str.startswith
:
import re
result = [s for s in re.findall(r"\d{10}", a_string) if any(map(s.startswith, matches))]
# ['6401104219', '6401104202']
If you know the prefixes are all 3 digits long, you can do better:
matches = set(matches)
result = [s for s in re.findall(r"\d{10}", a_string) if s[:3] in matches]
You will have to change the regex to r"\b(\d{10})\b"
if you want to exclude possible 10-digit prefixes of longer numbers.
CodePudding user response:
You can regex.
- Find all the 10-digit numbers
- Filter out the numbers which starts from the given elements in matches list
The code :
import re
a_string = "Some text 6401104219 and 6401104202 and 2201104202"
matches = ["240", "880", "898", "910", "920",
"960", "209", "309", "409", "471", "640"]
match = re.findall(r'\d{10}', a_string) # finding all the 10 digit numbers
# filtering out the numbers which starts from the given elements in matches
ans = [i for i in match if any(map(i.startswith, matches))]
# OR
# ans = [i for i in match if i[:3] in matches] # if lenght is 3 only then simply check its existence in list
print(ans)
# ['6401104219', '6401104202']
CodePudding user response:
a_string = "Some text 6401104219 and 6401104202 and 2201104202 and 640110420212"
matches = ["240", "880", "898", "910", "920", "960", "209", "309", "409", "471", "640"]
a_string_list=a_string.split(' ')
for i in a_string_list:
for j in matches:
if i.startswith(j) and len(i)==10:
print(i)
break
CodePudding user response:
You can directly use re.
import re
a_string = "Some text 6401104219 and 6401104202 and 2201104202 and 640110420212"
matches = ["240", "880", "898", "910", "920", "960", "209", "309", "409", "471", "640"]
result = re.findall(r"\b(?:" r"|".join(matches) r")\d{7}\b", a_string)
print(result)
# ['6401104219', '6401104202']