Python pandas regex extract to 4 new columns-CodePudding

import pandas as pd
df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)']})
print(df)

Current DataFrame:

             data
0  2 (B) - 15 (K)

What am looking to do is to extract 2, B, 15 and K into 4 new columns within the same dataframe.

is that possible using pandas.regex directly?

CodePudding user response：

You can extract all characters that are numeric or alphabetical using str.extractall, then unstack the result:

>>> df.data.str.extractall("([A-Za-z1-9] )").unstack()

       0
match  0  1   2  3
0      2  B  15  K

To re-assign the extracted values to the original dataframe, you can use:

df[["col1", "col2", "col3", "col4"]] = df.data.str.extractall("([A-Za-z1-9] )").unstack()

CodePudding user response：

With the same pattern spaces-parenthesis-dash and empty strings, this way works

df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)', '']})
print(df['data'].str.extract('(\d*).\((.)\).-.(\d*).\((.)\)'))
#      0    1    2    3
# 0    2    B   15    K
# 1  NaN  NaN  NaN  NaN