I have a data frame like :
---- ---- -------- ------- --------
| p | a | col1 | col2 | col3 |
---- ---- -------- ------- --------
| p1 | a1 | MANGO1 | APPLE | GUAVA |
| p2 | a2 | MANGO2 | APPLE | GRAPES |
| p3 | a2 | MANGO1 | APPLE | ORANGE |
| p1 | a1 | MANGO2 | APPLE | KIWI |
| p2 | a2 | MANGO1 | APPLE | ORANGE |
---- ---- -------- ------- --------
I want it to transform it to:
---- ---- -------- -------- ------- ------- -------- -------- ------
| p | a | MANGO1 | MANGO2 | APPLE | GUAVA | GRAPES | ORANGE | KIWI |
---- ---- -------- -------- ------- ------- -------- -------- ------
| p1 | a1 | YES | YES | YES | YES | NO | NO | YES |
| p2 | a2 | YES | YES | YES | YES | YES | NO | NO |
| p3 | a2 | YES | NO | YES | NO | NO | YES | NO |
---- ---- -------- -------- ------- ------- -------- -------- ------
The idea is to group by the columns p and a. Transpose the other columns and put YES NO as values.
CodePudding user response:
You can use melt
first to flatten your dataframe then pivot_table
to reshape your dataframe:
out = (df.melt(['p', 'a']).assign(variable='YES')
.pivot_table('variable', ['p', 'a'], 'value', fill_value='NO', aggfunc='first')
.rename_axis(columns=None).reset_index())
Output:
>>> out
p a APPLE GRAPES GUAVA KIWI MANGO1 MANGO2 ORANGE
0 p1 a1 YES NO YES YES YES YES NO
1 p2 a2 YES YES NO NO YES YES YES
2 p3 a2 YES NO NO NO YES NO YES
Setup MRE:
data = {'p': ['p1', 'p2', 'p3', 'p1', 'p2'],
'a': ['a1', 'a2', 'a2', 'a1', 'a2'],
'col1': ['MANGO1', 'MANGO2', 'MANGO1', 'MANGO2', 'MANGO1'],
'col2': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE'],
'col3': ['GUAVA', 'GRAPES', 'ORANGE', 'KIWI', 'ORANGE']}
df = pd.DataFrame(data)