I am sorry if I am unable to put out the question properly. But here is my CODE
data1 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG']
data2 = ['TOOK2231515100HG','BOGGOK2221643200GH']
for i in data1:
splt_1 = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)', i)
print('data1:', splt_1)
for I in data2:
splt_2 = re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
print('data2:', splt_2)
Output result
data1: ['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', '']
data1: ['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', '']
data2: ['', 'TOOK', '22315', '15100', 'HG', '']
data2: ['', 'BOGGOK', '22216', '43200', 'GH', '']
What I want to do?
if
data = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG', 'TOOK2231515100HG','BOGGOK2221643200GH']
I want to be able to loop and split data list using the 2 method
re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i) or
re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]
P.S: Output result can be in same or similar format
I tried this Code
data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
for i in data5:
dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
print(dk)
Result
['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', None, None, None, None, '']
['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', None, None, None, None, '']
['', None, None, None, None, None, None, 'TOOK', '22315', '15100', 'HG', '']
['', None, None, None, None, None, None, 'BOGGOK', '22216', '43200', 'GH', '']
Result I want
['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', '']
['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', '']
['', 'TOOK', '22315', '15100', 'HG', '']
['', 'BOGGOK', '22216', '43200', 'GH', '']
or
['TOOK', '22', 'JAN', '15', '15100', 'HG']
['BOGGOK', '22', 'MAR', '17', '42200', 'HG']
['TOOK', '22315', '15100', 'HG']
['BOGGOK', '22216', '43200', 'GH']
Thank you for taking your time to answer my question.. really appreciate it.
CodePudding user response:
First Solution:
data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
for i in range(len(data5)):
data5[i] = [item for item in (re.split(r'(TOOK|BOGGOK)([0-9]{2})([A-Z]{3})([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', data5[i])) if (item is not None) and len(item)>0]
print(data5)
Second Solution:
def resultant_string_list(data5):
return [([item for item in (re.split(r'(TOOK|BOGGOK)([0-9]{2})([A-Z]{3})([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', data5[i])) if (item is not None) and len(item)>0]) for i in range(len(data5))]
data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
print(resultant_string_list(data5))
output for both the above code:
[['TOOK', '22', 'JAN', '15', '15100', 'HG'], ['BOGGOK', '22', 'MAR', '17', '42200', 'HG'], ['TOOK', '22315', '15100', 'HG'], ['BOGGOK', '22216', '43200', 'GH']]
I have written the code keeping in mind the space used, so your current variable get replaced with the resultant list. Example:
Before: data5[0] = "TOOK22JAN1515100HG"
After: data5[0] = ['TOOK', '22', 'JAN', '15', '15100', 'HG']
CodePudding user response:
You can remove all the None values in your code:
data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
for i in data5:
dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
dk = list(filter(lambda a: a != None, dk)) #This removes all the None values from your list
print(dk)
CodePudding user response:
I've got two ideas for you:
import re
data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
# first approach (ugly): keep the current, simple code, then get rid of Nones
for i in data5:
dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
dk = list([s for s in dk if s != None])
print(dk)
# second approach: condition to find out which case holds
for i in data5:
dk = None
if re.search(r'JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC', i):
dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)', i)
else:
dk = re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
print(dk)
CodePudding user response:
Try using the filter function to remove None from the lists:
dk = list(filter(lambda x:x!=None, re.split(Your reg expression here), i))
You get None
cause there are two capturing groups and if neither of them matches then a None
gets into the list.