I have a data frame like this
import pandas as pd
#create DataFrame
df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'quarter': [1, 1, 2, 2, 1, 1, 2, 2],
'employee': ['Andy', 'Bob', 'Chad', 'Diane',
'Elana', 'Frank', 'George', 'Hank']})
I want to reduce repeated rows by concatinating values in employee column. Only way I think I can do that is like this
#group by store and quarter, then concatenate employee strings
df.groupby(['store', 'quarter'], as_index=False).agg({'employee': ' '.join})
store quarter employee
0 A 1 Andy Bob
1 A 2 Chad Diane
2 B 1 Elana Frank
3 B 2 George Hank
This is a minimal reproducible data, but my real data frame have a lot of columns, do I need to add all column names after groupby or is there another way to do this?
CodePudding user response:
You can do this without putting column names also.
Take below df
for example:
In [1011]: df
Out[1011]:
store quarter employee col1
0 A 1 Andy abc
1 A 1 Bob abc
2 A 2 Chad abc
3 A 2 Diane abc
4 B 1 Elana abc
5 B 1 Frank abc
6 B 2 George abc
7 B 2 Hank abc
Use:
In [1012]: df = df.groupby(['store', 'quarter'], as_index=False).agg(' '.join)
In [1013]: df
Out[1013]:
store quarter employee col1
0 A 1 Andy Bob abc abc
1 A 2 Chad Diane abc abc
2 B 1 Elana Frank abc abc
3 B 2 George Hank abc abc
This will run agg
on the remaining columns except the ones mentioned in groupby.
CodePudding user response:
This will give you the answer you desire
df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'quarter': [1, 1, 2, 2, 1, 1, 2, 2],
'employee': ['Andy', 'Bob', 'Chad', 'Diane',
'Elana', 'Frank', 'George', 'Hank']})
df = df.groupby(['store', 'quarter'])['employee'].apply(list).agg(' '.join).reset_index(name='new')
df