I have dataframe similar to this one:
id name val1_rain val2_tik val3_bon val4_tig ...
0 2349 Rivi 0.11 0.34 0.78 0.21
1 3397 Mani NaN NaN NaN NaN
2 0835 Pigi 0.34 NaN 0.32 NaN
3 5093 Tari 0.65 0.12 0.34 2.45
4 2340 Yoti NaN NaN NaN NaN
I want to drop any row that has all null values for all the columns that come after the name column ( [:,2:]). So the result output would look like this:
id name val1_rain val2_tik val3_bon val4_tig ...
0 2349 Rivi 0.11 0.34 0.78 0.21
2 0835 Pigi 0.34 NaN 0.32 NaN
3 5093 Tari 0.65 0.12 0.34 2.45
I have tried to do something like this:
df[~df.iloc[:,2:].isnull()]
but that raised an error:
ValueError: cannot reindex from a duplicate axis
First of all, I'm not sure why the error speaks about duplicate axis.
Then, I would like to find a way that I can have only rows that have any value at any column after the 2nd column.
I haven't found any question similar to this.
CodePudding user response:
You can filter if exist at least one non missing values after second columns with DataFrame.notna
and DataFrame.any
:
df = df[df.iloc[:,2:].notna().any(axis=1)]
print (df)
id name val1_rain val2_tik val3_bon val4_tig
0 2349 Rivi 0.11 0.34 0.78 0.21
2 835 Pigi 0.34 NaN 0.32 NaN
3 5093 Tari 0.65 0.12 0.34 2.45