Home > Software design >  filter dataframe using isna() to filter ourt rows that have null value in following columns
filter dataframe using isna() to filter ourt rows that have null value in following columns

Time:04-29

I have dataframe similar to this one:

     id     name    val1_rain   val2_tik    val3_bon   val4_tig ...
0   2349    Rivi    0.11       0.34            0.78       0.21
1   3397    Mani    NaN        NaN             NaN        NaN
2   0835    Pigi    0.34       NaN             0.32       NaN
3   5093    Tari    0.65       0.12            0.34       2.45
4   2340    Yoti    NaN        NaN             NaN        NaN

I want to drop any row that has all null values for all the columns that come after the name column ( [:,2:]). So the result output would look like this:

     id     name    val1_rain   val2_tik    val3_bon   val4_tig ...
0   2349    Rivi    0.11       0.34            0.78       0.21
2   0835    Pigi    0.34       NaN             0.32       NaN
3   5093    Tari    0.65       0.12            0.34       2.45

I have tried to do something like this:

df[~df.iloc[:,2:].isnull()]

but that raised an error:

ValueError: cannot reindex from a duplicate axis

First of all, I'm not sure why the error speaks about duplicate axis.

Then, I would like to find a way that I can have only rows that have any value at any column after the 2nd column.

I haven't found any question similar to this.

CodePudding user response:

You can filter if exist at least one non missing values after second columns with DataFrame.notna and DataFrame.any:

df = df[df.iloc[:,2:].notna().any(axis=1)]
print (df)
     id  name  val1_rain  val2_tik  val3_bon  val4_tig
0  2349  Rivi       0.11      0.34      0.78      0.21
2   835  Pigi       0.34       NaN      0.32       NaN
3  5093  Tari       0.65      0.12      0.34      2.45
  • Related