Home > Enterprise >  Pandas Dataframe Groupby join string whilst preserving order of strings [duplicate]
Pandas Dataframe Groupby join string whilst preserving order of strings [duplicate]

Time:09-28

Similar questions have been asked i.e Concatenate strings from multiple rows using Pandas groupby and remove duplicates from the comma separated cell

I would like to concatenate string values in a pandas groupby lambda function, however whilst maintaining the order of the strings, solutions use the set function which does not preserve the order when multiple values are passed in.

df = df.sort_values(
        ['id', 'order_column']
    ).groupby('id').agg(
        {
            'channel': lambda x: ' > '.join(set(x)),
            'value': np.sum
        }
    )

How do I do this whilst maintaining the order of values passed to the sequence? In my example, the data is

1             Email
2         Affiliate
3    Organic Search
4             Email
5    Branded Social
6            Direct
7    Branded Social
8            Direct
9    Branded Social
10            Email
11        Affiliate
12            Email
13           Direct
14            Email
15           Direct
16            Email
17   Branded Social
18           Direct
19   Branded Social

What I get: 'Affiliate > Email > Organic Search > Branded Social > Direct'

What I expect 'Email > Affiliate > Organic Search > Branded Social > Direct'

CodePudding user response:

Use the sort=False parameter in groupby and drop_duplicates instead set:

df = df.sort_values(
        ['id', 'order_column']
    ).groupby('id', sort=False).agg(
        {
            'channel': lambda x: ' > '.join(x.drop_duplicates()),
            'value': np.sum
        }
    )
  • Related