Home > Software engineering >  Python Pandas Filter names of Columns What have one or more NaN
Python Pandas Filter names of Columns What have one or more NaN

Time:06-26

Hello can you help me please ? I have CSV and I want name of columns what have only int64 or float and have one or more Nan value. I know filter only columns what have int64 or float but How add condition for NaN?

df.loc[:, (df.dtypes == np.int64) | (df.dtypes == np.float_)].isna().any().index.values

CodePudding user response:

The fact is that np.int64 columns can't have missing values and can't contain NaN. There are many numeric types of columns that can, foremost float64, but of course also the extension arrays.

Here's what I would do, look at all numeric columns, then get a list of all that have any NA value.

# get some example data
import seaborn
df = seaborn.load_dataset("penguins")
numeric_columns = df.select_dtypes("number")
column_has_nan = numeric_columns.isna().any(axis=0)
column_has_nan
bill_length_mm       True
bill_depth_mm        True
flipper_length_mm    True
body_mass_g          True
dtype: bool

Filter to only those where "column_has_nan" is true - the column name is in the index of that result:

list(column_has_nan.loc[column_has_nan].index)
# output
['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']

Maybe to put it together concisely as a filter of the whole dataframe I'd use something like this:

(df.select_dtypes("number")
 .loc[:, lambda df_: df_.isna().any(axis=0)])

Note that you can use ["float64", "int64"] or any other selection instead of "number".

Note that is is using the chained .loc with a callback - that enables using a value derived from the dataframe itself - at that point in the chain - for selection.

CodePudding user response:

df_numeric = df.select_dtypes(include=[np.float64,np.int64])

nan_cols = [i for i in df_numeric .columns if df_numeric [i].isnull().any()]
  • Related