Home > Blockchain >  Python pandas regex extract to 4 new columns
Python pandas regex extract to 4 new columns

Time:10-06

import pandas as pd
df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)']})
print(df)

Current DataFrame:

             data
0  2 (B) - 15 (K)

What am looking to do is to extract 2, B, 15 and K into 4 new columns within the same dataframe.

is that possible using pandas.regex directly?

CodePudding user response:

You can extract all characters that are numeric or alphabetical using str.extractall, then unstack the result:

>>> df.data.str.extractall("([A-Za-z1-9] )").unstack()

       0
match  0  1   2  3
0      2  B  15  K

To re-assign the extracted values to the original dataframe, you can use:

df[["col1", "col2", "col3", "col4"]] = df.data.str.extractall("([A-Za-z1-9] )").unstack()

CodePudding user response:

With the same pattern spaces-parenthesis-dash and empty strings, this way works

df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)', '']})
print(df['data'].str.extract('(\d*).\((.)\).-.(\d*).\((.)\)'))
#      0    1    2    3
# 0    2    B   15    K
# 1  NaN  NaN  NaN  NaN
  • Related