Home > Software engineering >  How to Concatenate Strings from Using GroupBy in big data frames
How to Concatenate Strings from Using GroupBy in big data frames

Time:06-02

I have a data frame like this

import pandas as pd

#create DataFrame
df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'quarter': [1, 1, 2, 2, 1, 1, 2, 2],
                   'employee': ['Andy', 'Bob', 'Chad', 'Diane',
                                'Elana', 'Frank', 'George', 'Hank']})

I want to reduce repeated rows by concatinating values in employee column. Only way I think I can do that is like this

#group by store and quarter, then concatenate employee strings
df.groupby(['store', 'quarter'], as_index=False).agg({'employee': ' '.join})

    store   quarter employee
0   A   1   Andy Bob
1   A   2   Chad Diane
2   B   1   Elana Frank
3   B   2   George Hank

This is a minimal reproducible data, but my real data frame have a lot of columns, do I need to add all column names after groupby or is there another way to do this?

CodePudding user response:

You can do this without putting column names also.

Take below df for example:

In [1011]: df
Out[1011]: 
  store  quarter employee col1
0     A        1     Andy  abc
1     A        1      Bob  abc
2     A        2     Chad  abc
3     A        2    Diane  abc
4     B        1    Elana  abc
5     B        1    Frank  abc
6     B        2   George  abc
7     B        2     Hank  abc

Use:

In [1012]: df = df.groupby(['store', 'quarter'], as_index=False).agg(' '.join)

In [1013]: df
Out[1013]: 
  store  quarter     employee     col1
0     A        1     Andy Bob  abc abc
1     A        2   Chad Diane  abc abc
2     B        1  Elana Frank  abc abc
3     B        2  George Hank  abc abc

This will run agg on the remaining columns except the ones mentioned in groupby.

CodePudding user response:

This will give you the answer you desire

df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'quarter': [1, 1, 2, 2, 1, 1, 2, 2],
                   'employee': ['Andy', 'Bob', 'Chad', 'Diane',
                                'Elana', 'Frank', 'George', 'Hank']})
df = df.groupby(['store', 'quarter'])['employee'].apply(list).agg(' '.join).reset_index(name='new')
df
  • Related