How to rearrange rows in pandas dataframe based on Product and Sub-Product?-CodePudding

Product and Sub-Product Category

Product - Fruits Sub-Product - Apple Orange

Product - Vegetables Sub-Product - Tomato Onion

I am finding a challenge to rearrange the rows of the python dataframe based on product and sub-product by Code. If a Code contains both products Fruits and Vegetable, it should be rearranged to Product and Sub-Product level for example code : 3E45212, Please help!

Df1:
Code    Product     Limit   Value
3A68185 Fruits      0.6     0
3A68185 Apple       0.6     0
3B22979 Apple       3.5     0.430145588
3B22979 Fruits      3.5     0.430145588
3B22979 Orange      0       0
3C67260 Apple       3       1.123774052
3C67260 Fruits      3       1.123774052
3C71601 Vegetables  15  0
3C71601 Tomato      15  0
3E45212 Apple       5   0
3E45212 Fruits      5   0
3E45212 Tomato      35.5    0
3E45212 Onion       0.5 0
3E45212 Vegetables  36  0
3C78910 Fruits      2   1.187282182
3C78910 Apple       2   1.187282182
3C82861 Fruits      64  0.560863589
3C82861 Apple       15  0
3C82861 Orange      49  0.560863589
3D11357 Tomato      25.5    0
3D11357 Onion       0.5 0
3D11357 Vegetables  26  0
3D51126 Onion       0.5 0
3D51126 Vegetables  15  0
3D51126 Tomato      14.5    0
3E20062 Onion       1   0
3E20062 Vegetables  1   0

Expected Output:

Code    Product Limit   Value
3A68185 Fruits   0.6    0
3A68185 Apple    0.6    0
3B22979 Fruits   3.5    0.430145588
3B22979 Apple    3.5    0.430145588
3B22979 Orange   0      0
3C67260 Fruits   3      1.123774052
3C67260 Apple    3      1.123774052
3C71601 Vegetables  15  0
3C71601 Tomato      15  0
3E45212 Fruits      5   0
3E45212 Apple       5   0
3E45212 Vegetables  36  0
3E45212 Tomato      35.5    0
3E45212 Onion       0.5 0
3C78910 Fruits      2   1.187282182
3C78910 Apple       2   1.187282182
3C82861 Fruits      64  0.560863589
3C82861 Apple       15  0
3C82861 Orange      49  0.560863589
3D11357 Vegetables  26  0
3D11357 Tomato  25.5    0
3D11357 Onion   0.5 0
3D51126 Vegetables  15  0
3D51126 Onion   0.5 0
3D51126 Tomato  14.5    0
3E20062 Vegetables  1   0
3E20062 Onion   1   0

Code :

df1.sort_values(by=['Code'],inplace=True)

CodePudding user response：

You may turn Product to an ordered categorical dtype, where you set categories in the needed order:

df1['Product'] = pd.Categorical(df1['Product'], ['Fruits', 'Apple', 'Orange', 'Vegetables', 'Tomato', 'Onion'])

And then group by Code first and then by Product, use sum as aggregate:

df1_grouped = df1.groupby(['Code', 'Product']).agg('sum')

Then you would need to drop rows with empty categories:

df1_grouped[df1_grouped.Limit != 0.0]

Output:

                    Limit     Value
Code    Product                    
3A68185 Fruits        0.6  0.000000
        Apple         0.6  0.000000
3B22979 Fruits        3.5  0.430146
        Apple         3.5  0.430146
3C67260 Fruits        3.0  1.123774
        Apple         3.0  1.123774
3C71601 Vegetables   15.0  0.000000
        Tomato       15.0  0.000000
3C78910 Fruits        2.0  1.187282
        Apple         2.0  1.187282
3C82861 Fruits       64.0  0.560864
        Apple        15.0  0.000000
        Orange       49.0  0.560864
3D11357 Vegetables   26.0  0.000000
        Tomato       25.5  0.000000
        Onion         0.5  0.000000
3D51126 Vegetables   15.0  0.000000
        Tomato       14.5  0.000000
        Onion         0.5  0.000000
3E20062 Vegetables    1.0  0.000000
        Onion         1.0  0.000000
3E45212 Fruits        5.0  0.000000
        Apple         5.0  0.000000
        Vegetables   36.0  0.000000
        Tomato       35.5  0.000000
        Onion         0.5  0.000000

CodePudding user response：

create a dictionary for sort order and then sort the dataframe

# create a sort sequence, Fruits are grouped first and then vegetables,
# fruits within fruit category are sequenced, and similar for vegetable
d={'Fruits':0, 'Apple':1, 'Orange': 2, 
  'Vegetables':10, 'Tomato':11, 'Onion': 12}


# create a temp sort order column and sort according to dictionary
# finally drop the temp sortid column

(df.assign(sortid=df['Product'].map(d))
 .sort_values(['Code','sortid'])
 .drop(columns='sortid'))

Code    Product     Limit   Value
0   3A68185     Fruits  0.6     0.000000
1   3A68185     Apple   0.6     0.000000
3   3B22979     Fruits  3.5     0.430146
2   3B22979     Apple   3.5     0.430146
4   3B22979     Orange  0.0     0.000000
6   3C67260     Fruits  3.0     1.123774
5   3C67260     Apple   3.0     1.123774
7   3C71601     Vegetables  15.0    0.000000
8   3C71601     Tomato  15.0    0.000000
14  3C78910     Fruits  2.0     1.187282
15  3C78910     Apple   2.0     1.187282
16  3C82861     Fruits  64.0    0.560864
17  3C82861     Apple   15.0    0.000000
18  3C82861     Orange  49.0    0.560864
21  3D11357     Vegetables  26.0    0.000000
19  3D11357     Tomato  25.5    0.000000
20  3D11357     Onion   0.5     0.000000
23  3D51126     Vegetables  15.0    0.000000
24  3D51126     Tomato  14.5    0.000000
22  3D51126     Onion   0.5     0.000000
26  3E20062     Vegetables  1.0     0.000000
25  3E20062     Onion   1.0     0.000000
10  3E45212     Fruits  5.0     0.000000
9   3E45212     Apple   5.0     0.000000
13  3E45212     Vegetables  36.0    0.000000
11  3E45212     Tomato  35.5    0.000000
12  3E45212     Onion   0.5     0.000000