I have this dataset where there is a column named 'Discount' and the values are given as '20% off', '25% off' etc.
What i want is to keep just the number in the column and remove the % symbol and the 'off' string.
I'm using this formula to do it.
df['discount'] = df['discount'].apply(lambda x: x.lstrip('%').rstrip('off')
However, when i apply that formula, all the values in the column 'discount' becomes "nan".
I even used this formula as well,
df['discount'] = df['discount'].str.replace('off' , '')
However, this does the same thing.
Is there any other way of handling this? I just want to make all the values in that column to be just the number which is like 25, 20, 10 and get rid of that percentage sign and the string value.
CodePudding user response:
Try this:
d['discount'] = d['discount'].str.replace(r'(%|\s*off)', '', regex=True).astype(int)
Output:
>>> df
discount
0 20
1 25
CodePudding user response:
I came up with this solution:
d['discount'] = d['discount'].split("%")[0]
or as int:
d['discount'] = int(d['discount'].split("%")[0])
We chop the string in two pieces at the %-sign and then take the first part, the number.
CodePudding user response:
If you have a fixed % off
suffix, the most efficient is to just remove the last 5 characters:
d['discount'] = d['discount'].str[:-5].astype(int)