For example, I have a dataframe 'df' like this:
Name | color | id | weight | |
---|---|---|---|---|
john | blue | 67 | 70 | |
clara | yellow | - | 67 | |
diana | red | 89 | 56 |
Here the numeric columns like "id" and "weight" should have all numeric values, unlike the second value of "id" which is a '-'.
If I do df.dtypes, it returns:
| name | object
| color | object
| id | object
| weight | float
**How can I traverse through the dataframe column-wise, then check if the type of column is an object, then if it an object, then check if it is becoming an object because of the typo '-' like id- if yes then raise a flag **
CodePudding user response:
Zip up the column name and the dtypes to make a tuple:
for col_name, col_type in zip(df.columns, df.dtypes):
if col_type == "object":
# do whatever here
pass
CodePudding user response:
import numpy as np
object_cols = df.select_dtypes("object").columns
result = []
for col in object_cols:
try:
df[col].replace("-",0).astype("int64")
result.append(col)
except:
pass
and result would contain all columns that could be of type int if there is no "-"