I am not very good at regex, trying to improve here. I am trying to split a string column which has values like:
1000 10%
300-7%
I am using
pattern = r'[ |-]'
df[postsplitcol] = df[col].str.split(pattern)
However, what happens is the or the - sign also gets removed. How can i retain the or the - in the right string post split ? I have tried looking at some similar questions on the forum, but could not find an efficient way that addresses the either pattern of the split which am trying to split on here.
CodePudding user response:
df['col'].str.split("(?=[ -])", expand = True)
0 1
0 1000 10%
1 300 -7%
CodePudding user response:
Use str.extract
here:
df[["num", "pct"]] = df["col"].str.extract(r'^(\d (?:\.\d )?)([ -]\d %)$')
Here is an explanation of the regex pattern used:
^
from the start of the column(
open first capture group\d
match an integer(?:\.\d )?
with an optional decimal component)
close first capture group(
open second capture group[ -]
match /-\d
match an integer%
match %)
close second capture group$
end of the column
We than map the two captured values into the two columns specified on the LHS.
CodePudding user response:
df = pd.DataFrame({'col1': ['1000 10%', '300-7%']})
df['col1'].str.split('\ |-', expand=True)
Output:
0 1
0 1000 10%
1 300 7%