Home > Mobile >  At each NaN value, drop the row and column it's located in from pandas DataFrame
At each NaN value, drop the row and column it's located in from pandas DataFrame

Time:05-12

I have some unknown DataFrame that can be of any size and shape, for example:

   first1  first2  first3  first4
a     NaN      22    56.0      65
c   380.0      40     NaN      66
b   390.0      50    80.0      64

My objective is to delete all columns and rows at which there is a NaN value. In this specific case, the output should be:

   first2  first4
b      50      64

Also, I need to preserve the option to use "all" like in pandas.DataFrame.dropna, meaning when an argument "all" passed, a column or a row must be dropped only if all its values are missing.

When I tried the following code:

def dropna_mta_style(df, how='any'):
  new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
  return new_df

It obviously didn't work, because it drops firstly the rows, and then searches for columns with Nan's, but it was already dropped.

Thanks in advance!

P.S: for and while loops, python built-in functions that act on iterables (all, any, map, ...), list and dictionary comprehensions shouldn't be used.

CodePudding user response:

Would something like this work ?

df.dropna(axis=1,how='any').loc[df.dropna(axis=0,how='any').index]

(Meaning we take the indexes of all rows for which we dont have NaNs in any row df.dropna(axis=0,how='any').index - then use that to locate the rows we want from the original df for which we drop all columns having at least one NaN)

CodePudding user response:

This should remove all rows and columns dynamically

df['Check'] = df.isin([np.nan]).any(axis=1)
df = df.dropna(axis = 1)
df = df.loc[df['Check'] == False]
df.drop('Check', axis = 1, inplace = True)
df

CodePudding user response:

Solution intended for readability:

rows = df.dropna(axis=0).index
cols = df.dropna(axis=1).columns
df = df.loc[rows, cols]
  • Related