Home > Software engineering >  how do I breakdown a string in multiple components
how do I breakdown a string in multiple components

Time:12-28

I have a string 'ABCAPITAL23JAN140CE'. This is the symbol for an option traded on stock exchange. ABCAPITAL part of the string is the company name. 23 is year 2023. JAN is for month. 140 is the strike price and CE is the type of the option.

All these components can vary for different options.

I need a function such that pieces_of_string = splitstring('ABCAPITAL23JAN140CE')

where pieces_of_string = ['ABCAPITAL', 23, 'JAN', 140, 'CE'] is returned

how do I do that?

CodePudding user response:

You might use re.findall with [A-Z] |\d

See the matches here on regex101

import re
print(re.findall(r"[A-Z] |\d ", "ABCAPITAL23JAN140CE"))

# Or converting to int
print([int(v) if v.isdigit() else v for v in re.findall(r"[A-Z] |\d ", "ABCAPITAL23JAN140CE")])

Output

['ABCAPITAL', '23', 'JAN', '140', 'CE']
['ABCAPITAL', 23, 'JAN', 140, 'CE']

Another option with 4 capture groups matching the digits and the shorted part for the month like JAN FEB etc...

^(\S*?)(\d )(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d )(\S )$

See the capture group matches on regex101

import re
m = re.match(r"(\S*?)(\d )(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)(\d )(\S )$", "ABCAPITAL23JAN140CE")
if m:
    print(list(m.groups()))

Output

['ABCAPITAL', '23', '140', 'CE']

CodePudding user response:

def splitstring(s):
    l=[s[0]]
    for h in s[1:]:
       H=h   l[-1][0]
    
        if H.isdigit() or H.isalpha():
            l[-1] =h
        else:
            l.append(h)
    return l
        
        

print(splitstring('ABCAPITAL23JAN140CE'))

CodePudding user response:

This method test if two adjacent characters are of the same type or not, if yes then concatenate the letters else split.

    st = 'ABCAPITAL23JAN140CE'
    l = []
    s = ""
    for i in range(0,len(st)-1):
        if (st[i].isnumeric() == st[i 1].isnumeric()) or (st[i].isalpha() == st[i 1].isalpha()):
            s = s   st[i]
        else:
            s = s   st[i]
            if s.isnumeric():
                l.append(int(s))
            else:
                l.append(s)
            s = ""

Output:

['ABCAPITAL', 23, 'JAN', 140]

CodePudding user response:

import re print(re.findall(r"[A-Z] |\d ", "ABCAPITAL23JAN140CE"))

  • Related