Similar questions have been asked i.e Concatenate strings from multiple rows using Pandas groupby and remove duplicates from the comma separated cell
I would like to concatenate string values in a pandas groupby lambda function, however whilst maintaining the order of the strings, solutions use the set
function which does not preserve the order when multiple values are passed in.
df = df.sort_values(
['id', 'order_column']
).groupby('id').agg(
{
'channel': lambda x: ' > '.join(set(x)),
'value': np.sum
}
)
How do I do this whilst maintaining the order of values passed to the sequence? In my example, the data is
1 Email
2 Affiliate
3 Organic Search
4 Email
5 Branded Social
6 Direct
7 Branded Social
8 Direct
9 Branded Social
10 Email
11 Affiliate
12 Email
13 Direct
14 Email
15 Direct
16 Email
17 Branded Social
18 Direct
19 Branded Social
What I get: 'Affiliate > Email > Organic Search > Branded Social > Direct'
What I expect 'Email > Affiliate > Organic Search > Branded Social > Direct'
CodePudding user response:
Use the sort=False
parameter in groupby
and drop_duplicates
instead set
:
df = df.sort_values(
['id', 'order_column']
).groupby('id', sort=False).agg(
{
'channel': lambda x: ' > '.join(x.drop_duplicates()),
'value': np.sum
}
)