This is not about dropping columns whose name contains a string.
I have a dataframe with 1600 columns. Several hundred are garbage. Most of the garbage columns contain a phrase such as invalid value encountered in double_scalars (XYZ)
where `XYZ' is a filler name for the column name.
I would like to delete all columns that contain, in any of their elements, the string invalid
Purging columns with strings in general would work too. What I want is to clean it up so I can fit a machine learning model to it, so removing any/all columns that are not boolean or real would work.
This must be a duplicate question, but I can only find answers to how to remove a column with a specific column name.
CodePudding user response:
Use apply
to make a mask checking if each column contains invalid
, and then pass that mask to the second position of .loc
:
df = df.loc[:, ~df.apply(lambda col: col.astype(str).str.contains('invalid')).any()]
CodePudding user response:
You can use df.select_dtypes(include=[float,bool]
) or df.select_dtypes(exclude=['object'])
Link to docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html