I have dataframe with two columns, cluster titles and the chapter that they belong in. I would like to create a third column, containing the 'order' or location of that cluster in the chapter.
So, I would like to turn the following dataframe:
cluster_title, chapter
"rabbits", 1
"horses", 1
"cows", 1
"trains", 2
"airplanes", 2
"ships", 2
"carrot", 3
"potato", 3
"tomato", 3
Into something like this:
cluster_title, chapter, position_in_chapter,
"rabbits", 1, 1
"horses" 1, 2
"cows", 1, 3
"trains", 2, 1
"airplanes", 2, 2
"ships", 2, 3
"carrot", 3, 1
"potato", 3, 2
"tomato", 3, 3
I tried approaching it with the group_by
function and using the index somehow, but either I am missing something obvious (quite likely) or it is the wrong approach as the resulting object requires extra steps that seem to take me in the wrong direction.
Could someone point me in the right direction?
CodePudding user response:
Try with groupby
and cumcount
:
df["position_in_chapter"] = df.groupby("chapter").cumcount() 1
>>> df
cluster_title chapter position_in_chapter
0 rabbits 1 1
1 horses 1 2
2 cows 1 3
3 trains 2 1
4 airplanes 2 2
5 ships 2 3
6 carrot 3 1
7 potato 3 2
8 tomato 3 3