Home > database >  How to delete rows which has nan or empty value in SPECIFIC column?
How to delete rows which has nan or empty value in SPECIFIC column?

Time:09-07

I have a dataframe which has nan or empty cell in specific column for example column index 2. unfortunately I don't have subset. I just have index. I want to delete the rows which has this features. in stackoverflow there are too many soluntions which are using subset

This is the dataframe for example:

12 125 36 45 665

15 212 12 65 62

65 9 nan 98 84

21 54 78 5 654

211 65 58 26 65

...

output:

12 125 36 45 665

15 212 12 65 62

21 54 78 5 654

211 65 58 26 65

CodePudding user response:

If need test third column (with index=2) use boolean indexing if nan is missing value np.nan or string nan:

idx = 2


df1 = df[df.iloc[:, idx].notna() & df.iloc[:, idx].ne('nan')]

#if no value is empty string or nan string or missing value NaN/None
#df1 = df[df.iloc[:, idx].notna() & ~df.iloc[:, idx].isin(['nan',''])]
print (df1)
     0    1     2   3    4
0   12  125  36.0  45  665
1   15  212  12.0  65   62
3   21   54  78.0   5  654
4  211   65  58.0  26   65

If nans are missing values:

df1 = df.dropna(subset=df.columns[[idx]])
print (df1)
     0    1     2   3    4
0   12  125  36.0  45  665
1   15  212  12.0  65   62
3   21   54  78.0   5  654
4  211   65  58.0  26   65

CodePudding user response:

Not sure what you mean by

there are too many soluntions which are using subset

but the way to do this would be

df[~df.isna().any(axis=1)]

CodePudding user response:

You can use notnull()

df = df.loc[df[df.columns[idx]].notnull()]
  • Related