I have a dataframe with multiple columns, all are ordered in ascending order:
40 41 42 43 44 45 46 47 48 49
0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 2 1 1 1 1
3 1 1 1 1 1 2 1 1 1 1
4 1 1 1 1 1 2 2 1 1 1
.. .. .. .. .. .. .. .. .. .. ..
367 18 26 25 25 30 25 27 27 30 29
368 18 26 26 25 30 25 27 27 31 29
369 18 27 27 25 30 25 27 27 31 29
370 19 27 27 25 30 25 27 27 31 29
371 19 27 27 25 30 25 27 27 31 29
I would want to groupby each column's values and run cumcount. I know I could iterate through all the columns, but as people say you should avoid iteration as much as you. So I would like to know if there is a more elegant solution.
CodePudding user response:
If you have a reasonable number of columns, using apply
on the columns is actually not that bad:
df.apply(lambda c: c.groupby(c).cumcount())
output:
40 41 42 43 44 45 46 47 48 49
0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 0 2 2 2 2
3 3 3 3 3 3 1 3 3 3 3
4 4 4 4 4 4 2 0 4 4 4
367 0 0 0 0 0 0 0 0 0 0
368 1 1 0 1 1 1 1 1 0 1
369 2 0 0 2 2 2 2 2 1 2
370 0 1 1 3 3 3 3 3 2 3
371 1 2 2 4 4 4 4 4 3 4