Home > Software design >  How do i split list using 2 different method in for LOOP?
How do i split list using 2 different method in for LOOP?

Time:02-12

I am sorry if I am unable to put out the question properly. But here is my CODE

data1 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG']
data2 = ['TOOK2231515100HG','BOGGOK2221643200GH']

for i in data1:
  splt_1 = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)', i)
  print('data1:', splt_1)

for I in data2:
  splt_2 = re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
  print('data2:', splt_2)

Output result

data1: ['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', '']
data1: ['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', '']

data2: ['', 'TOOK', '22315', '15100', 'HG', '']
data2: ['', 'BOGGOK', '22216', '43200', 'GH', '']

What I want to do?

if

data = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG', 'TOOK2231515100HG','BOGGOK2221643200GH']

I want to be able to loop and split data list using the 2 method

re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i) or
re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]

P.S: Output result can be in same or similar format

I tried this Code

data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']

for i in data5:
  dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
  print(dk)

Result

['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', None, None, None, None, '']
['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', None, None, None, None, '']
['', None, None, None, None, None, None, 'TOOK', '22315', '15100', 'HG', '']
['', None, None, None, None, None, None, 'BOGGOK', '22216', '43200', 'GH', '']

Result I want

['', 'TOOK', '22', 'JAN', '15', '15100', 'HG', '']
['', 'BOGGOK', '22', 'MAR', '17', '42200', 'HG', '']
['', 'TOOK', '22315', '15100', 'HG', '']
['', 'BOGGOK', '22216', '43200', 'GH', '']

or

['TOOK', '22', 'JAN', '15', '15100', 'HG']
['BOGGOK', '22', 'MAR', '17', '42200', 'HG']
['TOOK', '22315', '15100', 'HG']
['BOGGOK', '22216', '43200', 'GH']

Thank you for taking your time to answer my question.. really appreciate it.

CodePudding user response:

First Solution:

data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
for i in range(len(data5)):
    data5[i] = [item for item in (re.split(r'(TOOK|BOGGOK)([0-9]{2})([A-Z]{3})([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', data5[i])) if (item is not None) and len(item)>0]
print(data5)

Second Solution:

def resultant_string_list(data5):
    return [([item for item in (re.split(r'(TOOK|BOGGOK)([0-9]{2})([A-Z]{3})([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', data5[i])) if (item is not None) and len(item)>0]) for i in range(len(data5))]

data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']
print(resultant_string_list(data5))

output for both the above code:

[['TOOK', '22', 'JAN', '15', '15100', 'HG'], ['BOGGOK', '22', 'MAR', '17', '42200', 'HG'], ['TOOK', '22315', '15100', 'HG'], ['BOGGOK', '22216', '43200', 'GH']]

I have written the code keeping in mind the space used, so your current variable get replaced with the resultant list. Example:

Before: data5[0] = "TOOK22JAN1515100HG"
After: data5[0] = ['TOOK', '22', 'JAN', '15', '15100', 'HG']

CodePudding user response:

You can remove all the None values in your code:

data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']

for i in data5:
  dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
  dk = list(filter(lambda a: a != None, dk)) #This removes all the None values from your list
  print(dk)

CodePudding user response:

I've got two ideas for you:

import re

data5 = ['TOOK22JAN1515100HG','BOGGOK22MAR1742200HG','TOOK2231515100HG','BOGGOK2221643200GH']

# first approach (ugly): keep the current, simple code, then get rid of Nones
for i in data5:
    dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)|(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
    dk = list([s for s in dk if s != None])
    print(dk)

# second approach: condition to find out which case holds
for i in data5:
    dk = None
    if re.search(r'JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC', i):
        dk = re.split(r'(TOOK|BOGGOK)([0-9]{2})(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)([0-9]{2})([0-9]{5})(HG|GH)', i)
    else:
        dk = re.split(r'(TOOK|BOGGOK)([0-9]{5})([0-9]{5})(HG|GH)', i)
    print(dk)

CodePudding user response:

Try using the filter function to remove None from the lists:

dk = list(filter(lambda x:x!=None, re.split(Your reg expression here), i))

You get None cause there are two capturing groups and if neither of them matches then a None gets into the list.

  • Related