I am trying to create a pivot of my Dataframe that has no numerical values and duplicates exist in the index column. Given below is how my data looks:
sale_id, product, sale_date
101, ABC, 2021-01-01
101, DEF, 2021-02-01
101, XYZ, 2021-03-01
101, KLM, 2021-01-04
Expect the below output:
ABC, DEF, XYZ, KLM
101 2021-01-01, 2021-02-01, 2021-03-01, 2021-01-04
I tried the below
df.pivot(index='sale_id', columns='product', values='sale_date')
It threw the below error
ValueError: Index contains duplicate entries, cannot reshape
CodePudding user response:
I am trying to create a pivot of my Dataframe that has no numerical values and duplicates exist in the index column.
For test duplicates use DataFrame.duplicated
:
df1 = df[df.duplicated(['sale_id','product'], keep=False)]
print (df1)
For remove duplicates use DataFrame.drop_duplicates
:
(df.drop_duplicates(['sale_id','product'])
.pivot(index='sale_id', columns='product', values='sale_date'))