Specific transpose of my pandas Dataframe-CodePudding

I have a dataframe that I would like to transpose in a certain way, in which the "attr" column values become columns instead of values, while price stays as a column.

I have tried to group the columns and transpose it, but haven't found a way to get where I wanted. This is my dataset:

           attr                      values   price
0         Mærke            Knauf Insulation   24.95
1   Produkttype           Bygningsisolering   24.95
2         Serie                       SPACE   24.95
3         Model                  FORMSTYKKE   24.95
4         Mærke                   Bromiflex   20.00
5   Produkttype                     Rørskål   20.00
6     Materiale       Opskummet polyethylen   20.00
7     Størrelse                      Ø18 MM   20.00
8         Mærke                   Skamowall  190.00
9   Produkttype             Isoleringsplade  190.00
10        Serie                       BASIC  190.00
11    Materiale  Brændt kalk og mikrosilika  190.00
12        Mærke                    Rockwool  210.00
13  Produkttype           Bygningsisolering  210.00
14        Serie                 Terrænbatts  210.00
15    Materiale                     Stenuld  210.00
16        Mærke            Knauf Insulation   65.00
17  Produkttype                   Isolering   65.00

What I want is this:

Mærke             Produkttype       Serie   Model      Materiale             Størrelse Price
Knauf Insulation  Bygningsisolering SPACE   FORMSTYKKE NAN                   NAN       24.95
Bromiflex         Rørskål           NAN     NAN        Opskummet polyethylen Ø18 MM    24.95

I started with df.groupby(["attr", "values"])["price"].mean().reset_index().set_index("attr"), but didnt get the wanted structure, which most likely involves transposing the dataset.

Any help is highly appreciated!

CodePudding user response：

# produce data
df = pd.DataFrame(data=[
    ("Mærke", "Knauf Insulation", 24.95),
    ("Produkttype", "Bygningsisolering", 24.95), 
    ("Serie", "SPACE", 24.95), 
    ("Mærke", "Bromiflex", 20.00), 
    ("Produkttype", "Rørskål", 20.00), 
    ("Materiale", "Opskummet polyethylen", 20.00), 
    ("Størrelse", "Ø18 MM", 20.00), 

    
    
], 
    columns = ("attr", "values", "price")
)

# display data
df.head()

# output

attr    values                           price
0       Mærke   Knauf Insulation         24.95
1       Produkttype Bygningsisolering    24.95
2       Serie   SPACE                    24.95
3       Mærke   Bromiflex                20.00
4       Produkttype Rørskål              20.00


# transform data using *pivot* method
df = df.pivot(columns="attr", values="values", index="price").reset_index()
df.columns.name = None

# show results
df.head()

# output

price   Materiale   Mærke   Produkttype Serie   Størrelse
0   20.00   Opskummet polyethylen   Bromiflex   Rørskål NaN Ø18 MM
1   24.95   NaN Knauf Insulation    Bygningsisolering   SPACE   NaN

CodePudding user response：

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html is base capability
have defined an column that changes for each Mærke in case price is not unique

import io
import pandas as pd

df = pd.read_csv(io.StringIO("""           attr                      values   price
0         Mærke            Knauf Insulation   24.95
1   Produkttype           Bygningsisolering   24.95
2         Serie                       SPACE   24.95
3         Model                  FORMSTYKKE   24.95
4         Mærke                   Bromiflex   20.00
5   Produkttype                     Rørskål   20.00
6     Materiale       Opskummet polyethylen   20.00
7     Størrelse                      Ø18 MM   20.00
8         Mærke                   Skamowall  190.00
9   Produkttype             Isoleringsplade  190.00
10        Serie                       BASIC  190.00
11    Materiale  Brændt kalk og mikrosilika  190.00
12        Mærke                    Rockwool  210.00
13  Produkttype           Bygningsisolering  210.00
14        Serie                 Terrænbatts  210.00
15    Materiale                     Stenuld  210.00
16        Mærke            Knauf Insulation   65.00
17  Produkttype                   Isolering   65.00"""), sep="\s\s ", engine="python")


df.assign(prod_idx=df["attr"].eq("Mærke").cumsum()).pivot(
    index=["prod_idx", "price"], columns="attr", values=["values"]
).droplevel(0,1).reset_index()

	prod_idx	price	Materiale	Model	Mærke	Produkttype	Serie	Størrelse
0	1	24.95	nan	FORMSTYKKE	Knauf Insulation	Bygningsisolering	SPACE	nan
1	2	20	Opskummet polyethylen	nan	Bromiflex	Rørskål	nan	Ø18 MM
2	3	190	Brændt kalk og mikrosilika	nan	Skamowall	Isoleringsplade	BASIC	nan
3	4	210	Stenuld	nan	Rockwool	Bygningsisolering	Terrænbatts	nan
4	5	65	nan	nan	Knauf Insulation	Isolering	nan	nan