I have a dataframe which has nan or empty cell in specific column for example column index 2. unfortunately I don't have subset. I just have index. I want to delete the rows which has this features. in stackoverflow there are too many soluntions which are using subset
This is the dataframe for example:
12 125 36 45 665
15 212 12 65 62
65 9 nan 98 84
21 54 78 5 654
211 65 58 26 65
...
output:
12 125 36 45 665
15 212 12 65 62
21 54 78 5 654
211 65 58 26 65
CodePudding user response:
If need test third column (with index=2) use boolean indexing
if nan
is missing value np.nan
or string nan
:
idx = 2
df1 = df[df.iloc[:, idx].notna() & df.iloc[:, idx].ne('nan')]
#if no value is empty string or nan string or missing value NaN/None
#df1 = df[df.iloc[:, idx].notna() & ~df.iloc[:, idx].isin(['nan',''])]
print (df1)
0 1 2 3 4
0 12 125 36.0 45 665
1 15 212 12.0 65 62
3 21 54 78.0 5 654
4 211 65 58.0 26 65
If nan
s are missing values:
df1 = df.dropna(subset=df.columns[[idx]])
print (df1)
0 1 2 3 4
0 12 125 36.0 45 665
1 15 212 12.0 65 62
3 21 54 78.0 5 654
4 211 65 58.0 26 65
CodePudding user response:
Not sure what you mean by
there are too many soluntions which are using subset
but the way to do this would be
df[~df.isna().any(axis=1)]
CodePudding user response:
You can use notnull()
df = df.loc[df[df.columns[idx]].notnull()]