Given a table
col_0 | col_1 | col_2 | |
---|---|---|---|
0 | a_00 | a_01 | a_02 |
1 | a_10 | nan | a_12 |
2 | a_20 | a_21 | a_22 |
If I am returning all rows such col_1 does not contain nan
, then it can be easily done by df[df['col_1'].notnull()]
, which returns
col_0 | col_1 | col_2 | |
---|---|---|---|
0 | a_00 | a_01 | a_02 |
2 | a_20 | a_21 | a_22 |
If I would like to return all columns such that its 1-th row does not contain nan, what should I do? The following is the result that I want:
col_0 | col_2 | |
---|---|---|
0 | a_00 | a_02 |
1 | a_10 | a_12 |
2 | a_20 | a_22 |
I can transpose dataframe, remove rows on transposed dataframe, and transpose back, but it would become inefficient if dataframe is huge. I also tried
df.loc[df.loc[0].notnull()]
but the code gives me an error. Any ideas?
CodePudding user response:
you can use pandas DataFrame.dropna() function for this.
case 1: want to drop all nan values in column wise-
ex: df.dropna(axis = 1)
axis = 0 refers to horizontal axis or rows and axis = 1 refers to vertical axis or columns.
case 2: want to drop upto n number of rows-
ex: df[:n].dropna(axis = 1)
case 2: drop column in set of columns-
ex: df[["col_1","col_2"]].dropna(axis = 1)
it will drop nan values with in this two columns
note: If you want to make this change permant then use inplace = True (df.dropna(axis=1,inplace = True) or assign the results to another variable (df2 = df.dropna(axis=1)
CodePudding user response:
Boolean indexing with loc
along columns axis
df.loc[:, df.iloc[1].notna()]
Result
col_0 col_2
0 a_00 a_02
1 a_10 a_12
2 a_20 a_22