Clean way to rearrange columns that are repeated and have nans in them-CodePudding

I have the following dataframe:

Subject  Val1  Val1  Int  Val1  Val1  Int2  Val1
A        1     2     3    NaN   NaN   Sp    NaN
B        NaN   NaN   NaN  2     3     NaN   NaN
C        NaN   NaN   4    NaN   NaN   0     3 
D        NaN   NaN   3    NaN   NaN   8     NaN

I want to ended up with only 2 column that are val1 because it has at most 2 non-nans for a given subject. Namely, the output would look like this:

Subject  Val1  Val1 Int Int2
A        1     2    3   Sp
B        2     3    NaN NaN
C        3     NaN  4   0
D        NaN   NaN  3   8

is there a function in pandas to do this in a clean way? Clean meaning only a few lines of code. Because one way would be to iterate through row with a for loop and bring all nonnan values to the left, but I'd like something cleaner and more efficient as well.

CodePudding user response：

Idea is per groups by duplicated columns names use lambda function for sort values based by missing values, so possible remove all columns with only missing values in last steps:

df = df.set_index('Subject')
f = lambda x: pd.DataFrame(x.apply(sorted, key=pd.isna, axis=1).tolist(), index=x.index)
df = df.groupby(level=0, axis=1).apply(f).dropna(axis=1, how='all').droplevel(1, axis=1)
print (df)
         Int Int2  Val1  Val1
Subject                      
A        3.0   Sp   1.0   2.0
B        NaN  NaN   2.0   3.0
C        4.0    0   3.0   NaN
D        3.0    8   NaN   NaN