Home > Mobile >  Pandas Error: Index contains duplicate entries, cannot reshape
Pandas Error: Index contains duplicate entries, cannot reshape

Time:10-08

My question seems duplicate as I found different questions with the same error as follows:

Pandas: grouping a column on a value and creating new column headings

Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape

Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape

I tried all the solutions presented on those posts, but none worked. I believe the error maybe be caused by my dataset format, which has Strings instead of numbers. Here follows an example of my Dataset:

protocol_no activity description
1586212 walk twice a day
1586212 drive 5 km
1586212 sleep NaN
1586212 eat 1500 calories
2547852 walk NaN
2547852 drive NaN
2547852 eat 3200 calories
2547852 sleep At least 10 hours

The output I'm trying to achieve is:

protocol_no walk drive sleep eat
1586212 twice a day 5km NaN 1500 calories
2547852 NaN NaN 3200 calories At least 10 hours

I tried using pivot and pivot_table with a code like this:

df.pivot(index="protocol_no", columns="activity", values="description")

But I'm still getting this error:

ValueError: Index contains duplicate entries, cannot reshape

Have no idea what is going wrong, so any help will be helpful!

CodePudding user response:

Try using .piviot_table() with aggfunc='first' (or something similar) if you get duplicate index error when using .pivot()

df.pivot_table(index="protocol_no", columns="activity", values="description", aggfunc='first')

This is a common situation when the column you set as index has duplicated values. Using aggfunc='first' (or sometimes aggfunc='sum' depending on condition) most probably can solve the problem.

Result:

activity    drive            eat              sleep         walk
protocol_no                                                     
1586212      5 km  1500 calories                NaN  twice a day
2547852       NaN  3200 calories  At least 10 hours          NaN
  • Related