Home > front end >  conditional filtering on rows rather than columns
conditional filtering on rows rather than columns

Time:02-04

Given a table

col_0 col_1 col_2
0 a_00 a_01 a_02
1 a_10 nan a_12
2 a_20 a_21 a_22

If I am returning all rows such col_1 does not contain nan, then it can be easily done by df[df['col_1'].notnull()], which returns

col_0 col_1 col_2
0 a_00 a_01 a_02
2 a_20 a_21 a_22

If I would like to return all columns such that its 1-th row does not contain nan, what should I do? The following is the result that I want:

col_0 col_2
0 a_00 a_02
1 a_10 a_12
2 a_20 a_22

I can transpose dataframe, remove rows on transposed dataframe, and transpose back, but it would become inefficient if dataframe is huge. I also tried

df.loc[df.loc[0].notnull()]

but the code gives me an error. Any ideas?

CodePudding user response:

you can use pandas DataFrame.dropna() function for this.

case 1: want to drop all nan values in column wise-

     ex:  df.dropna(axis = 1)

axis = 0 refers to horizontal axis or rows and axis = 1 refers to vertical axis or columns.

case 2: want to drop upto n number of rows-

     ex: df[:n].dropna(axis = 1)

case 2: drop column in set of columns-

     ex: df[["col_1","col_2"]].dropna(axis = 1)  

it will drop nan values with in this two columns

note: If you want to make this change permant then use inplace = True (df.dropna(axis=1,inplace = True) or assign the results to another variable (df2 = df.dropna(axis=1)

CodePudding user response:

Boolean indexing with loc along columns axis

df.loc[:, df.iloc[1].notna()]

Result

  col_0 col_2
0  a_00  a_02
1  a_10  a_12
2  a_20  a_22
  • Related