I am trying to use pandas.to_numeric()
in order to convert the value of a column in my DataFrame to integers. The DataFrame is as follows:
QuestionID | Value | |
---|---|---|
0 | Q1 | 150.0 |
1 | Q2 | 160.0 |
2 | Q3 | NaN |
3 | Q4 | 210.0 |
4 | Q5 | Hello |
How could I possibly convert the values to integers if I have NaN
and Hello among the values using pandas.to_numeric()
while also dropping the rows that cannot be converted?
My expected dataframe is as follows:
QuestionID | Value | |
---|---|---|
0 | Q1 | 150 |
1 | Q2 | 160 |
3 | Q4 | 210 |
CodePudding user response:
'coerce' will return NaN for any non numeric value, which you can then drop those records with dropna.
df.assign(Value=pd.to_numeric(df.Value, errors='coerce')).dropna()
CodePudding user response:
df = pd.DataFrame([["Q1", "150"], ["Q2", "160"], ["Q3", "NaN"],
["Q4", "210"], ["Q5", "Hello"]], columns=["QuestionID", "Value"])
df
QuestionID Value
0 Q1 150
1 Q2 160
2 Q3 NaN
3 Q4 210
4 Q5 Hello
Since you'd like to drop all invalid rows, I'd perhaps consider using the pd.Series.str.isnumeric()
as an indexer:
df = df[df["Value"].str.isnumeric()] # Keep rows with numeric values in "Value"
df.loc[:, "Value"] = df["Value"].astype(int) # Cast to integers
Alternatively, building on @Chris suggestion, you can also add the integer type-casting after the df.assign
call:
df.assign({"Value": pd.to_numeric(df["Value"], errors='coerce')).dropna().astype({"Value": int})