Dropping rows with pandas data frame when multiple Null values exist-CodePudding

I'm attempting to go through each row in a data frame and checking if selected row has more than 3 null values (this part works) and then deleting the entire row. However, upon trying to drop said rows from the data frame, I'm met with an error:

AttributeError: 'NoneType' object has no attribute 'index'

Forgive me if this code is inefficient, I only need it to work.

import pandas as pd

df = pd.read_csv('data/mycsv.csv')


i = 0

while i < len(df.index):
    if df.iloc[i].isnull().sum() > 3:    
        df = df.drop(df.index[i], inplace = True)
    i  = 1

CodePudding user response：

Use DataFrame.dropna with thresh, but because it is for non NaNs column need subtract length of columns:

np.random.seed(2021)

df = pd.DataFrame(np.random.choice([np.nan, 1], size=(5,6)))
print (df)
     0    1    2    3    4    5
0  NaN  1.0  1.0  NaN  1.0  NaN
1  NaN  NaN  1.0  NaN  1.0  1.0
2  1.0  1.0  NaN  NaN  NaN  NaN
3  NaN  NaN  1.0  1.0  1.0  1.0
4  NaN  1.0  NaN  1.0  NaN  NaN

N = 3
df1 = df.dropna(thresh=len(df.columns) - N)
print(df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0


N = 2
df2 = df.dropna(thresh=len(df.columns) - N)
print(df2)
    0   1    2    3    4    5
3 NaN NaN  1.0  1.0  1.0  1.0

You can filter rows if equal or less like 3 NaNs in boolean indexing:

N = 3
df1 = df[df.isnull().sum(axis=1) <= N]
print (df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0

CodePudding user response：

Use threshold=X as parameter of dropna where X is the number of columns (df.shape[1]) minus your threshold (3).

Suppose this dataframe

>>> df
     0    1    2    3    4    5
0  NaN  NaN  NaN  NaN  NaN  NaN  # Drop (Nan = 6)
1  NaN  NaN  NaN  NaN  NaN  1.0  # Drop (Nan = 5)
2  NaN  NaN  NaN  NaN  1.0  1.0  # Drop (Nan = 4)
3  NaN  NaN  NaN  1.0  1.0  1.0  # Keep (Nan = 3)
4  NaN  NaN  1.0  1.0  1.0  1.0  # Keep (Nan = 2)
5  NaN  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 1)
6  1.0  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 0)

df = df.dropna(thresh=df.shape[1] - 3)
print(df)

     0    1    2    3    4    5
3  NaN  NaN  NaN  1.0  1.0  1.0
4  NaN  NaN  1.0  1.0  1.0  1.0
5  NaN  1.0  1.0  1.0  1.0  1.0
6  1.0  1.0  1.0  1.0  1.0  1.0