How pivot/pivot table the pandas dataframe with multiple column values-CodePudding

I have the dataframe. `

data = pd.DataFrame([['Benz', 'MinSpeed', 0, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'TopSpeed', 200, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'ChasisNum', 654121, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'Seats', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'AirBags', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'VehicleType', np.nan, 'Sedan', 'USA', '2022-08-12'],
                     ['Benz', 'Color', np.nan, 'Black','USA', '2022-08-12'],
                     ['Benz', 'InternetInside', np.nan, 'Yes','USA', '2022-08-12'],
                     
                     ['Ferrari', 'MinSpeed', 0, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'TopSpeed', 250, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'ChasisNum', 781121, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'Seats', 4, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'AirBags', 2, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'VehicleType', np.nan, 'SUV', 'France', '2022-12-25'],
                     ['Ferrari', 'Color', np.nan, 'Red','France', '2022-12-25'],
                     ['Ferrari', 'InternetInside', np.nan, 'No','France', '2022-12-25'],
                     ], 
                    columns= ['CarModel', 'Features', 'NumericalValues', 'CategoricalValues','Country', 'DeliveryDate'])

I am trying the pivot the data using the pivot function but getting repeated columns for "NumericalValues" and "CategoricalValues" values

Code: `

data.pivot(index='CarModel', columns='Features', values=['NumericalValues','CategoricalValues' ]).reset_index()

I need the expected output as: `

output_data = pd.DataFrame([['Benz', 0, 200, 654121, 5, 5, 'Sedan', 'Black', 'Yes', 'USA', '2022-08-12'],
                         ['Ferrari', 0, 250, 781121, 4, 2, 'SUV', 'Red', 'No', 'France', '2022-12-25']
                     ],
                    columns=['CarModel', 'MinSpeed', 'TopSpeed', 'ChasisNum','Seats', 'AirBags', 'VehicleType', 'Color', 'InternetInside', 'Country', 'DeliveryDate'])

` I tried with Pivot table as well but unable to get this output.

CodePudding user response：

You can perform your pivot, then run groupby.first on the columns to get rid of the unwanted columns:

out = (data
  .pivot(index=['CarModel', 'Country', 'DeliveryDate'],
         columns='Features'
        )
  .groupby(level='Features', axis=1).first()
  .reset_index()
)

Output:

Features CarModel Country DeliveryDate  AirBags  ChasisNum  Color InternetInside  MinSpeed  Seats  TopSpeed VehicleType
0            Benz     USA   2022-08-12      5.0   654121.0  Black            Yes       0.0    5.0     200.0       Sedan
1         Ferrari  France   2022-12-25      2.0   781121.0    Red             No       0.0    4.0     250.0         SUV

The advantage is that it maintains the dtypes:

Features
CarModel           object
Country            object
DeliveryDate       object
AirBags           float64
ChasisNum         float64
Color              object
InternetInside     object
MinSpeed          float64
Seats             float64
TopSpeed          float64
VehicleType        object
dtype: object

CodePudding user response：

Another possible solution, using pandas.pivot_table:

out = (data.pivot_table(
    index=['CarModel',  'Country', 'DeliveryDate'], 
    columns='Features', values=['NumericalValues', 'CategoricalValues'],
    aggfunc=max)
       .droplevel(0, axis=1)
       .rename_axis(None, axis=1)
       .reset_index())

Output:

  CarModel Country DeliveryDate  Color InternetInside VehicleType  AirBags  \
0     Benz     USA   2022-08-12  Black            Yes       Sedan      5.0   
1  Ferrari  France   2022-12-25    Red             No         SUV      2.0   

   ChasisNum  MinSpeed  Seats  TopSpeed  
0   654121.0       0.0    5.0     200.0  
1   781121.0       0.0    4.0     250.0

dtypes:

CarModel           object
Country            object
DeliveryDate       object
Color              object
InternetInside     object
VehicleType        object
AirBags           float64
ChasisNum         float64
MinSpeed          float64
Seats             float64
TopSpeed          float64

CodePudding user response：

From your data, as you have a value either in 'NumericalValues or 'CategoricalValues', so you can create a column that combine all the information from both column with fillna, then use the pivot as you did.

res = (
    data.assign(Values=lambda x: x['NumericalValues'].fillna(x['CategoricalValues']))
      .pivot(index='CarModel', columns='Features', values='Values')
      .reset_index().rename_axis(columns=None)
)
print(res)
#   CarModel AirBags ChasisNum  Color InternetInside MinSpeed Seats TopSpeed  \
# 0     Benz     5.0  654121.0  Black            Yes      0.0   5.0    200.0   
# 1  Ferrari     2.0  781121.0    Red             No      0.0   4.0    250.0   

#   VehicleType  
# 0       Sedan  
# 1         SUV