I want to convert all my columns to float if the certain column are digits.
Here's what I've tried so far:
for columns in range(len(df.columns)):
for rows in range(len(df)):
if str(df.iloc[rows,columns]).replace('.','',1).isdigit() == True:
df.iloc[rows,columns] = float(df.iloc[rows,columns])
It works well, but it takes a long time to run because of the data frame size. Does anyone have any idea for a simpler, much efficient code?
CodePudding user response:
Does this answer your question?
df = df.apply(lambda i: i.apply(lambda x: float(x) if str(x).replace('.','',1).isdigit() else x))
CodePudding user response:
You can apply pandas.to_numeric
to all the columns. By specifying errors='coerce'
the non-numeric values will be converted to NaNs.
https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html
Then pass the DataFrame to fillna
to fill the NaNs with the original non-numeric values.
This should be faster than the other answer for large DataFrames.
>>> df = pd.DataFrame([["x", 1, "2.1"], [3.2, "y", "5."]], columns=list("ABC"))
>>> df
A B C
0 x 1 2.1
1 3.2 y 5.
>>> df = (
df.apply(pd.to_numeric, errors='coerce', downcast='float')
.fillna(df)
)
>>> df
A B C
0 x 1 2.1
1 3.2 y 5
# confirm that the values are floats
>>> type(df.at[0,'C'])
float
>>> type(df.at[1,'C'])
float