I have a df something like this
animal age comment
1 cat 1 xyz
2 cat 2 xyz
3 cat 3 xyz
4 cat 4 xyz
5 cat 5 xyz
6 dog 1 xyz
7 dog 2 xyz
8 dog 3 xyz
9 dog 4 xyz
10 dog 5 xyz
It is already sorted by animal and age. My task is to replace the comment for top two rows in each animal with a certain text and then the next two rows with another text. And rest of the rows should be deleted.
Desired output:
animal age comment
1 cat 1 young
2 cat 2 young
3 cat 3 old
4 cat 4 old
5 dog 1 young
6 dog 2 young
7 dog 3 old
8 dog 4 old
I am able to do this but in 5-7 steps. I was wondering if there is a more efficient way to do this please.
CodePudding user response:
The trick here is to use cumcount
to create a sequential counter per animal
group, then use np.where
to update values in comment
based on the value of seq counter
i = df.groupby('animal').cumcount()
df['comment'] = np.where(i < 2, 'young', 'old')
df[i < 4]
animal age comment
1 cat 1 young
2 cat 2 young
3 cat 3 old
4 cat 4 old
6 dog 1 young
7 dog 2 young
8 dog 3 old
9 dog 4 old