Home > Mobile >  Clean way to rearrange columns that are repeated and have nans in them
Clean way to rearrange columns that are repeated and have nans in them

Time:05-21

I have the following dataframe:

Subject  Val1  Val1  Int  Val1  Val1  Int2  Val1
A        1     2     3    NaN   NaN   Sp    NaN
B        NaN   NaN   NaN  2     3     NaN   NaN
C        NaN   NaN   4    NaN   NaN   0     3 
D        NaN   NaN   3    NaN   NaN   8     NaN

I want to ended up with only 2 column that are val1 because it has at most 2 non-nans for a given subject. Namely, the output would look like this:

Subject  Val1  Val1 Int Int2
A        1     2    3   Sp
B        2     3    NaN NaN
C        3     NaN  4   0
D        NaN   NaN  3   8

is there a function in pandas to do this in a clean way? Clean meaning only a few lines of code. Because one way would be to iterate through row with a for loop and bring all nonnan values to the left, but I'd like something cleaner and more efficient as well.

CodePudding user response:

Idea is per groups by duplicated columns names use lambda function for sort values based by missing values, so possible remove all columns with only missing values in last steps:

df = df.set_index('Subject')
f = lambda x: pd.DataFrame(x.apply(sorted, key=pd.isna, axis=1).tolist(), index=x.index)
df = df.groupby(level=0, axis=1).apply(f).dropna(axis=1, how='all').droplevel(1, axis=1)
print (df)
         Int Int2  Val1  Val1
Subject                      
A        3.0   Sp   1.0   2.0
B        NaN  NaN   2.0   3.0
C        4.0    0   3.0   NaN
D        3.0    8   NaN   NaN
  • Related