Split a python column(str) on either pattern and retain the delimiter-CodePudding

I am not very good at regex, trying to improve here. I am trying to split a string column which has values like:

1000 10%

300-7%

I am using

pattern = r'[ |-]'
df[postsplitcol] = df[col].str.split(pattern)

However, what happens is the or the - sign also gets removed. How can i retain the or the - in the right string post split ? I have tried looking at some similar questions on the forum, but could not find an efficient way that addresses the either pattern of the split which am trying to split on here.

CodePudding user response：

df['col'].str.split("(?=[ -])", expand = True)

      0     1
0  1000   10%
1   300   -7%

CodePudding user response：

Use str.extract here:

df[["num", "pct"]] = df["col"].str.extract(r'^(\d (?:\.\d )?)([ -]\d %)$')

Here is an explanation of the regex pattern used:

^ from the start of the column
( open first capture group
\d match an integer
(?:\.\d )? with an optional decimal component
) close first capture group
( open second capture group
[ -] match /-
\d match an integer
% match %
) close second capture group
$ end of the column

We than map the two captured values into the two columns specified on the LHS.

CodePudding user response：

df = pd.DataFrame({'col1': ['1000 10%', '300-7%']})
df['col1'].str.split('\ |-', expand=True)

Output:

      0    1
0  1000  10%
1   300   7%