Home > other >  How do I generate the following in pandas?
How do I generate the following in pandas?

Time:06-04

I have a data frame like :

 ---- ---- -------- ------- -------- 
| p  | a  |  col1  | col2  |  col3  |
 ---- ---- -------- ------- -------- 
| p1 | a1 | MANGO1 | APPLE | GUAVA  |
| p2 | a2 | MANGO2 | APPLE | GRAPES |
| p3 | a2 | MANGO1 | APPLE | ORANGE |
| p1 | a1 | MANGO2 | APPLE | KIWI   |
| p2 | a2 | MANGO1 | APPLE | ORANGE |
 ---- ---- -------- ------- -------- 

I want it to transform it to:

 ---- ---- -------- -------- ------- ------- -------- -------- ------ 
| p  | a  | MANGO1 | MANGO2 | APPLE | GUAVA | GRAPES | ORANGE | KIWI |
 ---- ---- -------- -------- ------- ------- -------- -------- ------ 
| p1 | a1 | YES    | YES    | YES   | YES   | NO     | NO     | YES  |
| p2 | a2 | YES    | YES    | YES   | YES   | YES    | NO     | NO   |
| p3 | a2 | YES    | NO     | YES   | NO    | NO     | YES    | NO   |
 ---- ---- -------- -------- ------- ------- -------- -------- ------ 

The idea is to group by the columns p and a. Transpose the other columns and put YES NO as values.

CodePudding user response:

You can use melt first to flatten your dataframe then pivot_table to reshape your dataframe:

out = (df.melt(['p', 'a']).assign(variable='YES')
         .pivot_table('variable', ['p', 'a'], 'value', fill_value='NO', aggfunc='first')
         .rename_axis(columns=None).reset_index())

Output:

>>> out
    p   a APPLE GRAPES GUAVA KIWI MANGO1 MANGO2 ORANGE
0  p1  a1   YES     NO   YES  YES    YES    YES     NO
1  p2  a2   YES    YES    NO   NO    YES    YES    YES
2  p3  a2   YES     NO    NO   NO    YES     NO    YES

Setup MRE:

data = {'p': ['p1', 'p2', 'p3', 'p1', 'p2'],
        'a': ['a1', 'a2', 'a2', 'a1', 'a2'],
        'col1': ['MANGO1', 'MANGO2', 'MANGO1', 'MANGO2', 'MANGO1'],
        'col2': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE'],
        'col3': ['GUAVA', 'GRAPES', 'ORANGE', 'KIWI', 'ORANGE']}
df = pd.DataFrame(data)
  • Related