I have the dataframe. `
data = pd.DataFrame([['Benz', 'MinSpeed', 0, np.nan, 'USA', '2022-08-12'],
['Benz', 'TopSpeed', 200, np.nan, 'USA', '2022-08-12'],
['Benz', 'ChasisNum', 654121, np.nan, 'USA', '2022-08-12'],
['Benz', 'Seats', 5, np.nan, 'USA', '2022-08-12'],
['Benz', 'AirBags', 5, np.nan, 'USA', '2022-08-12'],
['Benz', 'VehicleType', np.nan, 'Sedan', 'USA', '2022-08-12'],
['Benz', 'Color', np.nan, 'Black','USA', '2022-08-12'],
['Benz', 'InternetInside', np.nan, 'Yes','USA', '2022-08-12'],
['Ferrari', 'MinSpeed', 0, np.nan, 'France', '2022-12-25'],
['Ferrari', 'TopSpeed', 250, np.nan, 'France', '2022-12-25'],
['Ferrari', 'ChasisNum', 781121, np.nan, 'France', '2022-12-25'],
['Ferrari', 'Seats', 4, np.nan, 'France', '2022-12-25'],
['Ferrari', 'AirBags', 2, np.nan, 'France', '2022-12-25'],
['Ferrari', 'VehicleType', np.nan, 'SUV', 'France', '2022-12-25'],
['Ferrari', 'Color', np.nan, 'Red','France', '2022-12-25'],
['Ferrari', 'InternetInside', np.nan, 'No','France', '2022-12-25'],
],
columns= ['CarModel', 'Features', 'NumericalValues', 'CategoricalValues','Country', 'DeliveryDate'])
`
I am trying the pivot the data using the pivot function but getting repeated columns for "NumericalValues" and "CategoricalValues" values
Code: `
data.pivot(index='CarModel', columns='Features', values=['NumericalValues','CategoricalValues' ]).reset_index()
`
I need the expected output as: `
output_data = pd.DataFrame([['Benz', 0, 200, 654121, 5, 5, 'Sedan', 'Black', 'Yes', 'USA', '2022-08-12'],
['Ferrari', 0, 250, 781121, 4, 2, 'SUV', 'Red', 'No', 'France', '2022-12-25']
],
columns=['CarModel', 'MinSpeed', 'TopSpeed', 'ChasisNum','Seats', 'AirBags', 'VehicleType', 'Color', 'InternetInside', 'Country', 'DeliveryDate'])
` I tried with Pivot table as well but unable to get this output.
CodePudding user response:
You can perform your pivot
, then run groupby.first
on the columns to get rid of the unwanted columns:
out = (data
.pivot(index=['CarModel', 'Country', 'DeliveryDate'],
columns='Features'
)
.groupby(level='Features', axis=1).first()
.reset_index()
)
Output:
Features CarModel Country DeliveryDate AirBags ChasisNum Color InternetInside MinSpeed Seats TopSpeed VehicleType
0 Benz USA 2022-08-12 5.0 654121.0 Black Yes 0.0 5.0 200.0 Sedan
1 Ferrari France 2022-12-25 2.0 781121.0 Red No 0.0 4.0 250.0 SUV
The advantage is that it maintains the dtypes:
Features
CarModel object
Country object
DeliveryDate object
AirBags float64
ChasisNum float64
Color object
InternetInside object
MinSpeed float64
Seats float64
TopSpeed float64
VehicleType object
dtype: object
CodePudding user response:
Another possible solution, using pandas.pivot_table
:
out = (data.pivot_table(
index=['CarModel', 'Country', 'DeliveryDate'],
columns='Features', values=['NumericalValues', 'CategoricalValues'],
aggfunc=max)
.droplevel(0, axis=1)
.rename_axis(None, axis=1)
.reset_index())
Output:
CarModel Country DeliveryDate Color InternetInside VehicleType AirBags \
0 Benz USA 2022-08-12 Black Yes Sedan 5.0
1 Ferrari France 2022-12-25 Red No SUV 2.0
ChasisNum MinSpeed Seats TopSpeed
0 654121.0 0.0 5.0 200.0
1 781121.0 0.0 4.0 250.0
dtypes:
CarModel object
Country object
DeliveryDate object
Color object
InternetInside object
VehicleType object
AirBags float64
ChasisNum float64
MinSpeed float64
Seats float64
TopSpeed float64
CodePudding user response:
From your data, as you have a value either in 'NumericalValues or 'CategoricalValues', so you can create a column that combine all the information from both column with fillna
, then use the pivot
as you did.
res = (
data.assign(Values=lambda x: x['NumericalValues'].fillna(x['CategoricalValues']))
.pivot(index='CarModel', columns='Features', values='Values')
.reset_index().rename_axis(columns=None)
)
print(res)
# CarModel AirBags ChasisNum Color InternetInside MinSpeed Seats TopSpeed \
# 0 Benz 5.0 654121.0 Black Yes 0.0 5.0 200.0
# 1 Ferrari 2.0 781121.0 Red No 0.0 4.0 250.0
# VehicleType
# 0 Sedan
# 1 SUV