import pandas as pd
df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)']})
print(df)
Current DataFrame:
data
0 2 (B) - 15 (K)
What am looking to do is to extract 2
, B
, 15
and K
into 4 new columns within the same dataframe.
is that possible using pandas.regex directly?
CodePudding user response:
You can extract all characters that are numeric or alphabetical using str.extractall
, then unstack the result:
>>> df.data.str.extractall("([A-Za-z1-9] )").unstack()
0
match 0 1 2 3
0 2 B 15 K
To re-assign the extracted values to the original dataframe, you can use:
df[["col1", "col2", "col3", "col4"]] = df.data.str.extractall("([A-Za-z1-9] )").unstack()
CodePudding user response:
With the same pattern spaces-parenthesis-dash and empty strings, this way works
df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)', '']})
print(df['data'].str.extract('(\d*).\((.)\).-.(\d*).\((.)\)'))
# 0 1 2 3
# 0 2 B 15 K
# 1 NaN NaN NaN NaN