Home > Software design >  While grouping two columns into a new dataframe , how do I create another column in the new datafram
While grouping two columns into a new dataframe , how do I create another column in the new datafram

Time:11-06

I am currently using the following code to combine Col A and Col B to get Col A & B in DataFrame B below:

out = (df
   .groupby('Col A', group_keys=False, sort=False)
   .apply(lambda d: d.iloc[:, ::-1].unstack().drop_duplicates())
   .reset_index(drop=True).to_frame(name='Col A&B')
)

My question is: how can I create the Col C in DataFrame B that uses the column headers from DataFrame A to label where each value in Col A & B came from?

DataFrame A

Col A Col B Col C
1000 100 10
1000 100 20
2000 200 30
2000 200 40

DataFrame B

Col A & B Col C
10 Col C
20 Col C
100 Col B
1000 Col A
30 Col C
40 Col C
200 Col B
2000 Col A

CodePudding user response:

You might want to use stack drop_duplicates instead of groupby:

df.stack().reset_index(level=1).drop_duplicates().rename(columns={0: 'Col A & B', 'level_1': 'Col C'})

   Col C  Col A & B
0  Col A        100
0  Col B         10
1  Col B         20
2  Col A        200
2  Col B         30
3  Col B         40

CodePudding user response:

You can use groupby and then use stack and sort_values in each group.

df_new = df.groupby('Col A'
                   ).apply(lambda g : g.stack().reset_index().sort_values([0],
                                                                          ascending=True)
                          ).reset_index(drop=True
                                       ).drop('level_0', 
                                              axis=1).rename(columns={0: 'Col A & B', 
                                                                      'level_1': 'Col C'}
                                                            ).drop_duplicates(subset=['Col C', 
                                                                                      'Col A & B'])

print(df_new)

Output:

    Col C  Col A & B
0   Col C         10
1   Col C         20
2   Col B        100
4   Col A       1000
6   Col C         30
7   Col C         40
8   Col B        200
10  Col A       2000
  • Related