I have some unknown DataFrame that can be of any size and shape, for example:
first1 first2 first3 first4
a NaN 22 56.0 65
c 380.0 40 NaN 66
b 390.0 50 80.0 64
My objective is to delete all columns and rows at which there is a NaN value. In this specific case, the output should be:
first2 first4
b 50 64
Also, I need to preserve the option to use "all" like in pandas.DataFrame.dropna, meaning when an argument "all" passed, a column or a row must be dropped only if all its values are missing.
When I tried the following code:
def dropna_mta_style(df, how='any'):
new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
return new_df
It obviously didn't work, because it drops firstly the rows, and then searches for columns with Nan's, but it was already dropped.
Thanks in advance!
P.S: for and while loops, python built-in functions that act on iterables (all, any, map, ...), list and dictionary comprehensions shouldn't be used.
CodePudding user response:
Would something like this work ?
df.dropna(axis=1,how='any').loc[df.dropna(axis=0,how='any').index]
(Meaning we take the indexes of all rows for which we dont have NaNs in any row df.dropna(axis=0,how='any').index
- then use that to locate the rows we want from the original df for which we drop all columns having at least one NaN)
CodePudding user response:
This should remove all rows and columns dynamically
df['Check'] = df.isin([np.nan]).any(axis=1)
df = df.dropna(axis = 1)
df = df.loc[df['Check'] == False]
df.drop('Check', axis = 1, inplace = True)
df
CodePudding user response:
Solution intended for readability:
rows = df.dropna(axis=0).index
cols = df.dropna(axis=1).columns
df = df.loc[rows, cols]