I am currently using the following code to combine Col A and Col B to get Col A & B in DataFrame B below:
out = (df
.groupby('Col A', group_keys=False, sort=False)
.apply(lambda d: d.iloc[:, ::-1].unstack().drop_duplicates())
.reset_index(drop=True).to_frame(name='Col A&B')
)
My question is: how can I create the Col C in DataFrame B that uses the column headers from DataFrame A to label where each value in Col A & B came from?
DataFrame A
Col A | Col B | Col C |
---|---|---|
1000 | 100 | 10 |
1000 | 100 | 20 |
2000 | 200 | 30 |
2000 | 200 | 40 |
DataFrame B
Col A & B | Col C |
---|---|
10 | Col C |
20 | Col C |
100 | Col B |
1000 | Col A |
30 | Col C |
40 | Col C |
200 | Col B |
2000 | Col A |
CodePudding user response:
You might want to use stack
drop_duplicates
instead of groupby
:
df.stack().reset_index(level=1).drop_duplicates().rename(columns={0: 'Col A & B', 'level_1': 'Col C'})
Col C Col A & B
0 Col A 100
0 Col B 10
1 Col B 20
2 Col A 200
2 Col B 30
3 Col B 40
CodePudding user response:
You can use groupby
and then use stack
and sort_values
in each group.
df_new = df.groupby('Col A'
).apply(lambda g : g.stack().reset_index().sort_values([0],
ascending=True)
).reset_index(drop=True
).drop('level_0',
axis=1).rename(columns={0: 'Col A & B',
'level_1': 'Col C'}
).drop_duplicates(subset=['Col C',
'Col A & B'])
print(df_new)
Output:
Col C Col A & B
0 Col C 10
1 Col C 20
2 Col B 100
4 Col A 1000
6 Col C 30
7 Col C 40
8 Col B 200
10 Col A 2000