Home > Blockchain >  Pandas Dataframe: how do I drop columns with NaNs in column pairs
Pandas Dataframe: how do I drop columns with NaNs in column pairs

Time:04-14

Beginner in Pandas here and for a project I am getting dataframes likes this:

import numpy as np
import pandas as pd

data = [[np.nan, np.nan, np.nan, np.nan, "x", np.nan],
        [np.nan, np.nan, np.nan, np.nan, "y", np.nan],
        [np.nan, np.nan, np.nan, np.nan, "z", np.nan]]

cols = [("column1", "first"), ("column1", "last"), ("column2", "first"), ("column2", "last"), ("column3", "first"), ("column3", "last")]

df = pd.Dataframe(data, columns=cols)

I am getting pairs of columns and want to drop all columns where all elements of both parts of the pair are NaN. For the given code I'd expect the last column pair. Expected result:

column3 / first column3 / last
x nan
y nan
z nan

I have tried pandas.dropna() with different subsets but I cant get it to work that the last column is not deleted. The name of the columns are variable but always 2 levels deep.

Can some Pandas-God help this poor soul?

CodePudding user response:

First need MultiIndex in columns:

df = pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(cols))

#if nott create it
#df = pd.Dataframe(data, columns=cols)
#df.columns = pd.MultiIndex.from_tuples(df.columns)
      

For each first level values test if all values are missing by GroupBy.transform and then test if at least one True per columns by DataFrame.any, chain inverted mask by ~ and | for bitwise OR and filter in DataFrame.loc:

m1 = ~df.isna().groupby(level=0, axis=1).transform('all').any()
m2 = ~df.isna().any()

df = df.loc[:, m1 | m2]
print(df)
  column3     
    first last
0       x  NaN
1       y  NaN
2       z  NaN
  • Related