I'm attempting to go through each row in a data frame and checking if selected row has more than 3 null values (this part works) and then deleting the entire row. However, upon trying to drop said rows from the data frame, I'm met with an error:
AttributeError: 'NoneType' object has no attribute 'index'
Forgive me if this code is inefficient, I only need it to work.
import pandas as pd
df = pd.read_csv('data/mycsv.csv')
i = 0
while i < len(df.index):
if df.iloc[i].isnull().sum() > 3:
df = df.drop(df.index[i], inplace = True)
i = 1
CodePudding user response:
Use DataFrame.dropna
with thresh
, but because it is for non NaNs column need subtract length of columns:
np.random.seed(2021)
df = pd.DataFrame(np.random.choice([np.nan, 1], size=(5,6)))
print (df)
0 1 2 3 4 5
0 NaN 1.0 1.0 NaN 1.0 NaN
1 NaN NaN 1.0 NaN 1.0 1.0
2 1.0 1.0 NaN NaN NaN NaN
3 NaN NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 NaN 1.0 NaN NaN
N = 3
df1 = df.dropna(thresh=len(df.columns) - N)
print(df1)
0 1 2 3 4 5
0 NaN 1.0 1.0 NaN 1.0 NaN
1 NaN NaN 1.0 NaN 1.0 1.0
3 NaN NaN 1.0 1.0 1.0 1.0
N = 2
df2 = df.dropna(thresh=len(df.columns) - N)
print(df2)
0 1 2 3 4 5
3 NaN NaN 1.0 1.0 1.0 1.0
You can filter rows if equal or less like 3 NaN
s in boolean indexing
:
N = 3
df1 = df[df.isnull().sum(axis=1) <= N]
print (df1)
0 1 2 3 4 5
0 NaN 1.0 1.0 NaN 1.0 NaN
1 NaN NaN 1.0 NaN 1.0 1.0
3 NaN NaN 1.0 1.0 1.0 1.0
CodePudding user response:
Use threshold=X
as parameter of dropna
where X is the number of columns (df.shape[1]
) minus your threshold (3
).
Suppose this dataframe
>>> df
0 1 2 3 4 5
0 NaN NaN NaN NaN NaN NaN # Drop (Nan = 6)
1 NaN NaN NaN NaN NaN 1.0 # Drop (Nan = 5)
2 NaN NaN NaN NaN 1.0 1.0 # Drop (Nan = 4)
3 NaN NaN NaN 1.0 1.0 1.0 # Keep (Nan = 3)
4 NaN NaN 1.0 1.0 1.0 1.0 # Keep (Nan = 2)
5 NaN 1.0 1.0 1.0 1.0 1.0 # Keep (Nan = 1)
6 1.0 1.0 1.0 1.0 1.0 1.0 # Keep (Nan = 0)
df = df.dropna(thresh=df.shape[1] - 3)
print(df)
0 1 2 3 4 5
3 NaN NaN NaN 1.0 1.0 1.0
4 NaN NaN 1.0 1.0 1.0 1.0
5 NaN 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 1.0 1.0 1.0