I'm beginner of pandas so I have a question below. There's a lot of answers about groupby rows but I can't find the answer what I want.
anyway my datatable is below.
COLUMN1 COLUMN2 COLUMN3
0 APPLE RED JOHN, JANE
1 BANANA YELLOW SMITH
1 BANANA YELLOW EMILY
2 GRAPE VIOLET JESSICA
2 GRAPE VIOLET REIRA
2 GRAPE VIOLET EMMA
2 GRAPE PURPLE JOE
2 GRAPE PURPLE LISA
3 MELON GREEN RIO
3 MELON GREEN REIRA
..
and I want to get this table. (edit : EXCEPT YELLOW)
COLUMN1 COLUMN2 COLUMN3
0 APPLE RED JOHN, JANE
1 BANANA YELLOW SMITH
1 BANANA YELLOW EMILY
2 GRAPE VIOLET JESSICA, REIRA, EMMA
2 GRAPE PURPLE JOE, LISA
3 MELON GREEN RIO, REIRA
..
How can I get this? Please give me a hint or answer then I'll appreciate a lot.. thank you.
CodePudding user response:
must make reproducible example for answer
EXAMPLE
data = [['APPLE', 'RED', 'JOHN, JANE'],
['BANANA', 'YELLOW', 'SMITH'],
['BANANA', 'YELLOW', 'EMILY'],
['GRAPE', 'VIOLET', 'JESSICA'],
['GRAPE', 'VIOLET', 'REIRA'],
['GRAPE', 'VIOLET', 'EMMA'],
['GRAPE', 'PURPLE', 'JOE'],
['GRAPE', 'PURPLE', 'LISA'],
['MELON', 'GREEN', 'RIO'],
['MELON', 'GREEN', 'REIRA']]
df = pd.DataFrame(data, index=[0, 1, 1, 2, 2, 2, 2, 2, 3, 3], columns=['col1', 'col2', 'col3'])
output(df
):
col1 col2 col3
0 APPLE RED JOHN, JANE
1 BANANA YELLOW SMITH
1 BANANA YELLOW EMILY
2 GRAPE VIOLET JESSICA
2 GRAPE VIOLET REIRA
2 GRAPE VIOLET EMMA
2 GRAPE PURPLE JOE
2 GRAPE PURPLE LISA
3 MELON GREEN RIO
3 MELON GREEN REIRA
First
make col3
to list
df1 = df.groupby([df.index, 'col1', 'col2']).agg(list).reset_index()
output(df1
):
level_0 col1 col2 col3
0 0 APPLE RED [JOHN, JANE]
1 1 BANANA YELLOW [SMITH, EMILY]
2 2 GRAPE PURPLE [JOE, LISA]
3 2 GRAPE VIOLET [JESSICA, REIRA, EMMA]
4 3 MELON GREEN [RIO, REIRA]
Second
join col3
except yellow and explode yellow
df1.assign(col3=df1.apply(lambda x: ','.join(x['col3']) if x['col2'] != 'YELLOW' else x['col3'] , axis=1)).explode('col3')
result:
level_0 col1 col2 col3
0 0 APPLE RED JOHN, JANE
1 1 BANANA YELLOW SMITH
1 1 BANANA YELLOW EMILY
2 2 GRAPE PURPLE JOE,LISA
3 2 GRAPE VIOLET JESSICA,REIRA,EMMA
4 3 MELON GREEN RIO,REIRA
if you want level_0
to index, use set_index
or set_axis
and at next time make reproducible exmaple for answer.
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'col1': ['Apple', 'Banana', 'Banana', 'Grape', 'Grape', 'Grape', 'Apple'], 'col2': ['Red', 'Yellow', 'Yellow', 'Violet', 'Violet', 'Purple', 'Red'], 'col3':['John, Jane', 'Smith', 'Emily', 'Jecica', 'Reira', 'Joe', 'Rio']})
df2 = df.groupby(['col1', 'col2'])['col3'].apply(list).reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ', '.join(map(str, x)))
df2
To avoid splitting Yellow of Banana,
import pandas as pd
df = pd.DataFrame({'col1': ['Apple', 'Banana', 'Banana', 'Grape', 'Grape', 'Grape', 'Apple'], 'col2': ['Red', 'Yellow', 'Yellow*', 'Violet', 'Violet', 'Purple', 'Red'], 'col3':['John, Jane', 'Smith', 'Emily', 'Jecica', 'Reira', 'Joe', 'Rio']})
df2 = df.groupby(['col1', 'col2'])['col3'].apply(list).reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ', '.join(map(str, x)))
df2['col2'] = df2['col2'].replace('Yellow*', 'Yellow')
df2