I have the following dataframe:
index id code data date
0 AZ234 B213 apple 2020-09-01 <- duplicate id, code, data
1 AZ234 B213 apple 2022-02-02 <- duplicate id, code, data
2 AZ234 B213 banana 2020-07-01
3 AZ234 B213 orange 2020-05-11
4 AL612 B309 apple 2020-12-05
5 AL612 B309 banana 2020-07-21
6 AL612 B309 orange 2020-09-21
...
I want to create pivot table to get the following table:
id code apple banana orange
AZ234 B213 2020-09-01 2020-07-01 2020-05-11
AL612 B309 2020-12-05 2020-07-21 2020-09-21
...
I have tried to do this using pivot_table (pandas):
pd.pivot_table(df, values='date', index=['id','code'],
columns=['data'])
but I get this error:
DataError: No numeric types to aggregate
I have read this post but it seems to be a bit different as I don't want to change the columns and also I got error when I tried to set_index with code and id ( " ValueError: Index contains duplicate entries, cannot reshape").
My goal is to create pivot table with dates as values of the table.
CodePudding user response:
There are duplicates per id, date, data
so is necessary add some aggregate function:
If there are datetimes:
df['date'] = pd.to_datetime(df['date'])
df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='first')
df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='max')
If there are strings:
print (df['date'].dtype)
df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc=','.join)