Let's say I have a couple of strings that look like:
data_20220110_073030.gz
ndsfhsfihso_20100330-100210.gz
l0dnd74n-19981001.180800.gz
I only want to extract information above that has 8 or 6 digits and are all numerical values from 0-9. Ideally, it would be output to a single array / list such as:
[20220110,073030]
[20100330,100210]
[19981001,180800]
I know one can use regex, but I can't seem to get it into an array.
CodePudding user response:
You may use the following pattern:
(?<!\d)\d{6}(?:\d\d)?(?!\d)
Demo.
Details:
(?<!\d)
- Not immediately preceded by a digit.\d{6}
- Match exactly 6 digits.(?:\d\d)?
- And (optionally) two more digits.(?!\d)
- Not immediately followed by a digit.
Python example:
import re
regex = r"(?<!\d)\d{6}(?:\d\d)?(?!\d)"
test_str = """data_20220110_073030.gz
ndsfhsfihso_20100330-100210.gz
l0dnd74n-19981001.180800.gz"""
arr = re.findall(regex, test_str)
print(arr)
Output:
['20220110', '073030', '20100330', '100210', '19981001', '180800']
CodePudding user response:
You can use the python Regular Expression library to find the sequence of characters that forms the search pattern you are looking fro
Example
import re
text = 'data_20220110_073030.gz ndsfhsfihso_20100330-100210.gz l0dnd74n 19981001.180800.gz'
x = re.findall('\d\d\d\d\d\d', text) #for 6 digits sequence
y = re.findall('\d\d\d\d\d\d\d\d', text) #for 8 digits sequence
print(y)
print(x)
you can impove on that by having a function create the pattern base on the length of digits you want
import re
def digitSequence(length: int, text: str):
pattern = ''
for i in range(length):
pattern = '\d'
return re.findall(pattern, text) # returns a list of match's found
print(digitSequence(8, text))
print(digitSequence(6, text))