Home > front end >  Python Rows to column on specific rule for volume dataframe
Python Rows to column on specific rule for volume dataframe

Time:09-16

So I have a df with department search_term sfr and every 3 lines are identical What I'm looking to do is: unmelt this df , to the output at the bottom where click_share_rank is the anchor

|       |   Unnamed: 0 | department                | search_term                    |   sfr | asin       | clicked_item |   click_share_rank |   click_share |   conversion_share |
|------:|-------------:|:--------------------------|-------------------------------:|:------|:-----------|-------------:|-------------------:|--------------:|-------------------:|
|     0 |            0 | Home and Garden           | air fryer                      |     1 | B07FDJMC9Q | Product A    |                  1 |        0.0647 |             0.0737 |
|     1 |            1 | Home and Garden           | air fryer                      |     1 | B07GJBBGHG | Product B    |                  2 |        0.0499 |             0.0569 |
|     2 |            2 | Home and Garden           | air fryer                      |     1 | B0936FGLQS | Product C    |                  3 |        0.0494 |             0.0628 |
|     6 |            6 | Camera and Photo          | surveillance & security cameras|     1 | B086DKSHQ4 | Product D    |                  1 |        0.1237 |             0.0541 |
|     7 |            7 | Camera and Photo          | surveillance & security cameras|     1 | B086DL32R3 | Product E    |                  2 |        0.1207 |             0.0811 |
|     8 |            8 | Camera and Photo          | surveillance & security cameras|     1 | B086DKGCFP | Product F    |                  3 |        0.1097 |             0.0405 |  

Output Notice that department search_term sfr is like pivoted, asin clicked_item click_share_rank click_share conversion_share is reshaped from long to wide

|       |   Unnamed: 0 | department                | search_term                    |   sfr | #1 asin    | #1 clicked_item |   #1 click_share_rank |   #1 click_share |   #1 conversion_share | #2 asin       | #2 clicked_item |   #2 click_share_rank |   #2 click_share |   #2 conversion_share || #3 asin       | #3 clicked_item |   #3 click_share_rank |   #3 click_share |   #3 conversion_share |
|------:|-------------:|:--------------------------|-------------------------------:|:------|:-----------|-------------:|-------------------:|--------------:|-------------------:|:-----------|-------------:|-------------------:|--------------:|-------------------:|:-----------|-------------:|-------------------:|--------------:|-------------------:|
|     0 |            0 | Home and Garden           | air fryer                      |     1 | B07FDJMC9Q | Product A    |                  1 |        0.0647 |             0.0737 | B07GJBBGHG | Product B    |                  2 |        0.0499 |             0.0569 | B0936FGLQS | Product C    |                  3 |        0.0494 |             0.0628 |
|     1 |            1 | Camera and Photo          | surveillance & security cameras|     1 | B086DKSHQ4 | Product D    |                  1 |        0.1237 |             0.0541 |B086DL32R3 | Product E    |                  2 |        0.1207 |             0.0811 |B086DKGCFP | Product F    |                  3 |        0.1097 |             0.0405 |

I tried pivot_table, but couldn't achieve this output, Any guide of the method to use would be highly appreciated Note: it's a 14m rows df

I have tried

pd.pivot_table(tempo, index=['department','search_term'])

This would solve the first part, however, I can't use the remaining columns as "columns" or "values" else it will get everything screwed, didn't get any other idea so far, only thought of pivot and it wasn't the correct path

Source file: github.com/BeboGhattas/temp-repo/blob/main/sourcefile.csv
Output format: github.com/BeboGhattas/temp-repo/blob/main/desired-output.csv

CodePudding user response:

I think the key is adding a column that specifies which entry in the group a certain row is, then using that column to pivot:

# Limit rows to ensure divisible by 3
tempo = pd.read_csv("https://github.com/BeboGhattas/temp-repo/blob/main/sourcefile.csv?raw=true", nrows=999)

# Add feature which specifies which entry in group
tempo["num"] = [val for _ in range(tempo.shape[0] // 3) for val in range(3)]

# Pivot
piv = pd.pivot_table(tempo, index=["department", "search_term", "sfr"], columns="num", values=["asin", "clicked_item", "click_share_rank", "click_share", "conversion_share"], aggfunc=lambda x: x)

# Rename columns and reset index
piv.columns = [f"#{num   1} {col}" for col, num in piv.columns]
piv = piv.reset_index()

print(piv)

Result

department             search_term  sfr     #1 asin     #2 asin  \
0          APS               air fryer    9  B07GJBBGHG  B07FDJMC9Q   
1          APS                 airpods    8  B09JQMJHXY  B07PXGQC1Q   
2          APS             apple watch    5  B09HF6H3X3  B09G96SSLB   
3          APS        apple watch band    3  B09KNCDSWN  B09TNKY6HP   
4          APS                backpack    6  B06XZTZ7GB  B07V4R4Z77   
..         ...                     ...  ...         ...         ...   
328   Wireless          iphone 11 case    6  B07W45LYY8  B07W4FMQ5Y   
329   Wireless          iphone 13 case    2  B09C1J2HW7  B09D51Z5TR   
330   Wireless  iphone 13 pro max case    4  B096J9ZSG1  B09D5P76MM   
331   Wireless              popsockets    3  B08WVHWYND  B07P29XQR4   
332   Wireless        portable charger    5  B07QXV6N1B  B07XFBN7HX   

        #3 asin  #1 click_share  #2 click_share  #3 click_share  \
0    B07FF117K7          0.0718          0.0653          0.0628   
1    B09JQL3NWT          0.3599          0.3484          0.0360   
2    B09HF6XBPF          0.1208          0.0871          0.0787   
3    B09V5N5SPY          0.0770          0.0498          0.0369   
4    B09HXMVFHK          0.0962          0.0311          0.0190   
..          ...             ...             ...             ...   
328  B09NGMVZF3          0.0429          0.0403          0.0401   
329  B09DPG49YB          0.0543          0.0491          0.0312   
330  B09NB8DN3Y          0.0576          0.0328          0.0319   
331  B08HRP96GQ          0.0718          0.0431          0.0341   
332  B08BRDVGP1          0.1760          0.1678          0.0680   

     #1 click_share_rank  #2 click_share_rank  #3 click_share_rank  \
0                      1                    2                    3   
1                      1                    2                    3   
2                      1                    2                    3   
3                      1                    2                    3   
4                      1                    2                    3   
..                   ...                  ...                  ...   
328                    1                    2                    3   
329                    1                    2                    3   
330                    1                    2                    3   
331                    1                    2                    3   
332                    1                    2                    3   

                                       #1 clicked_item  \
0    COSORI ProI Air Fryer Oven Combo, 5.8QT Max Xl...   
1    Apple AirPods Pro Wireless Earbuds with MagSaf...   
2    Apple Watch Series 7 [GPS 41mm] Smart Watch w/...   
3    6 Pack Sport Bands Compatible with Apple Watch...   
4    Matein Travel Laptop Backpack, Business Anti T...   
..                                                 ...   
328  OTTERBOX COMMUTER SERIES Case for iPhone 11 - ...   
329  Mkeke Compatible with iPhone 13 Case Clear, Tr...   
330  Spigen Ultra Hybrid [Anti-Yellowing Technology...   
331                        PopSockets Clear Phone Grip   
332  Anker Portable Charger, 313 Power Bank (PowerC...   

                                       #2 clicked_item  \
0              Ninja AF101 Air Fryer, 4 Qt, Black/gray   
1    Apple AirPods (2nd Generation) Wireless Earbud...   
2    Apple Watch SE [GPS 40mm] Smart Watch w/ Space...   
3    6 Pack Bands Compatible with Apple Watch Band ...   
4    Extra Large 52L Travel Laptop Backpack with US...   
..                                                 ...   
328  Mkeke Compatible with iPhone 11 Case, Clear Sh...   
329  OTTERBOX COMMUTER SERIES Case for iPhone 13 (O...   
330  OTTERBOX COMMUTER SERIES Case for iPhone 13 Pr...   
331  PopSockets: PopGrip with Swappable Top for Pho...   
332  2-Pack Miady 10000mAh Dual USB Portable Charge...   

                                       #3 clicked_item  #1 conversion_share  \
0    Ultrean Air Fryer, 4.2 Quart (4 Liter) Electri...               0.0913   
1    Apple AirPods (3rd Generation) Wireless Earbud...               0.4961   
2    Apple Watch Series 7 [GPS 45mm] Smart Watch w/...               0.1181   
3    Marge Plus for Apple Watch Band Series 7 6 5 4...               0.1089   
4    Backpack, Laptop Backpack, Carry on Backpack, ...               0.1481   
..                                                 ...                  ...   
328  Cordking iPhone 11 Case, Silicone [Square Edge...               0.0415   
329  Mkeke Compatible with iPhone 13 Case Black, No...               0.0812   
330  SPIDERCASE [3 IN 1] Designed for iPhone 13 Pro...               0.0691   
331  ​​​​PopSockets: Phone Grip with Expanding Kick...               0.0919   
332  Power Bank 26800mAh Portable Charger, IXNINE H...               0.2226   

     #2 conversion_share  #3 conversion_share  
0                 0.0814               0.0660  
1                 0.3494               0.0213  
2                 0.1012               0.0898  
3                 0.0614               0.0410  
4                 0.0545               0.0129  
..                   ...                  ...  
328               0.0468               0.0407  
329               0.0411               0.0392  
330               0.0274               0.0437  
331               0.0504               0.0329  
332               0.1812               0.0607  
  • Related