Pandas Error: Index contains duplicate entries, cannot reshape-CodePudding

My question seems duplicate as I found different questions with the same error as follows:

Pandas: grouping a column on a value and creating new column headings

Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape

Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape

I tried all the solutions presented on those posts, but none worked. I believe the error maybe be caused by my dataset format, which has Strings instead of numbers. Here follows an example of my Dataset:

protocol_no	activity	description
1586212	walk	twice a day
1586212	drive	5 km
1586212	sleep	NaN
1586212	eat	1500 calories
2547852	walk	NaN
2547852	drive	NaN
2547852	eat	3200 calories
2547852	sleep	At least 10 hours

The output I'm trying to achieve is:

protocol_no	walk	drive	sleep	eat
1586212	twice a day	5km	NaN	1500 calories
2547852	NaN	NaN	3200 calories	At least 10 hours

I tried using pivot and pivot_table with a code like this:

df.pivot(index="protocol_no", columns="activity", values="description")

But I'm still getting this error:

ValueError: Index contains duplicate entries, cannot reshape

Have no idea what is going wrong, so any help will be helpful!

CodePudding user response：

Try using .piviot_table() with aggfunc='first' (or something similar) if you get duplicate index error when using .pivot()

df.pivot_table(index="protocol_no", columns="activity", values="description", aggfunc='first')

This is a common situation when the column you set as index has duplicated values. Using aggfunc='first' (or sometimes aggfunc='sum' depending on condition) most probably can solve the problem.

Result:

activity    drive            eat              sleep         walk
protocol_no                                                     
1586212      5 km  1500 calories                NaN  twice a day
2547852       NaN  3200 calories  At least 10 hours          NaN