I want to count NaN values for each row in a DataFrame and then get the one with the minimum numbers of such values. My solution is too slow, also it is not a pandas-way to do it using for loop. Is there a better and faster way to do it?
max_not_nan = 13 # a maximum possible value of NaN's (number of columns 1)
row_number = 0
for i in range(df.shape[0]):
if df.iloc[i].isna().sum() < max_not_nan:
max_not_nan = df.iloc[i].isna().sum()
row_number = i
It works fine expect the time complexity
CodePudding user response:
Can you try this:
df['nan_count'] = df.isnull().sum(axis=1) #get nan counts for each row as a new column
max_nan=df[df['nan_count']==df['nan_count'].max()] #get the row with the max nan count
min_nan=df[df['nan_count']==df['nan_count'].min()] #get the row with the min nan count
CodePudding user response:
transactions.isnull().sum(axis=1).sort_values()