How do I create a new column which joins the column names for any non na values on a per row basis.
- Please note the duplicate index.
Code
so_df = pd.DataFrame({"ma_1":[10,np.nan,13,15],
"ma_2":[10,11,np.nan,15],
"ma_3":[np.nan,11,np.nan,15]},index=[0,1,1,2])
Example DF
ma_1 ma_2 ma_3
0 10.0 10.0 NaN
1 NaN 11.0 11.0
1 13.0 NaN NaN
2 15.0 15.0 15.0
Desired output is a new column which joins the column names for non na values as per col_names
example below.
so_df["col_names"] = ["ma_1, ma_2","ma_2, ma_3","ma_1","ma_1, ma_2, ma_3"]
ma_1 ma_2 ma_3 col_names
0 10.0 10.0 NaN ma_1, ma_2
1 NaN 11.0 11.0 ma_2, ma_3
1 13.0 NaN NaN ma_1
2 15.0 15.0 15.0 ma_1, ma_2, ma_3
CodePudding user response:
Try with dot
df['new'] = df.notna().dot(df.columns ',').str[:-1]
df
Out[77]:
ma_1 ma_2 ma_3 new
0 10.0 10.0 NaN ma_1,ma_2
1 NaN 11.0 11.0 ma_2,ma_3
1 13.0 NaN NaN ma_1
2 15.0 15.0 15.0 ma_1,ma_2,ma_3