What is the best way to convert a Pandas series that contains strings of the type "10%" and "0.10" into numeric values?
I know that if I have a series with just "0.10" type strings I can just do pd.to_numeric
.
I also know that if I have a series of "10%" type strings I can do str.replace("%","")
and then do pd.to_numeric
and divide by 100.
The issue I have is for a series with a mix of "0.10" and "10%" type strings. How do I best convert this into a series with the correct numeric types.
I think I could do it by first making a temporary series with True / False depending on if the string has "%" in it or not and then based on that applying a function. But this seems inefficient.
Is there a better way?
What I Have Tried for Reference:
mixed = pd.Series(["10%","0.10","5.5%","0.02563"])
mixed.str.replace("%","").astype("float")/100
0 0.100000
1 0.001000
2 0.055000
3 0.000256
dtype: float64
# This doesn't work, because even the 0.10 and 0.02563 are divided by 100.
CodePudding user response:
A very neat solution based on this answer is:
from pandas import Series, to_numeric
mixed = Series(["10%", "0.10", "5.5%", "0.02563"])
print(to_numeric(mixed.str.replace("%", "e-2")))
# 0 0.10000
# 1 0.10000
# 2 0.05500
# 3 0.02563
# dtype: float64
CodePudding user response:
The easiest solution is to select entries using a mask and handle them in bulk:
from pandas import Series, to_numeric
mixed = Series(["10%", "0.10", "5.5%", "0.02563"])
converted = mixed.copy()
# use a mask to select specific entries
mask = mixed.str.contains("%")
converted.loc[mask] = to_numeric(mixed.loc[mask].str.replace("%", "")) / 100
converted.loc[~mask] = to_numeric(mixed.loc[~mask])
converted
#0 0.1
#1 0.1
#2 0.055
#3 0.02563
#dtype: object
CodePudding user response:
Somehow you need a condition. This is one possible way:
l = pd.Series((float(x.strip('%'))/100 if '%' in x else float(x) for x in mixed))
print(l)
0 0.10000
1 0.10000
2 0.05500
3 0.02563
dtype: float64