Home > front end >  pandas replace text for top N rows for each category in a column
pandas replace text for top N rows for each category in a column

Time:07-15

I have a df something like this

             animal          age          comment
     1        cat             1             xyz
     2        cat             2             xyz
     3        cat             3             xyz
     4        cat             4             xyz
     5        cat             5             xyz
     6        dog             1             xyz
     7        dog             2             xyz
     8        dog             3             xyz
     9        dog             4             xyz
    10        dog             5             xyz

It is already sorted by animal and age. My task is to replace the comment for top two rows in each animal with a certain text and then the next two rows with another text. And rest of the rows should be deleted.

Desired output:

             animal          age          comment
     1        cat             1             young
     2        cat             2             young
     3        cat             3             old
     4        cat             4             old
     5        dog             1             young
     6        dog             2             young
     7        dog             3             old
     8        dog             4             old

I am able to do this but in 5-7 steps. I was wondering if there is a more efficient way to do this please.

CodePudding user response:

The trick here is to use cumcount to create a sequential counter per animal group, then use np.where to update values in comment based on the value of seq counter

i = df.groupby('animal').cumcount()
df['comment'] = np.where(i < 2, 'young', 'old')

df[i < 4]

  animal  age comment
1    cat    1   young
2    cat    2   young
3    cat    3     old
4    cat    4     old
6    dog    1   young
7    dog    2   young
8    dog    3     old
9    dog    4     old
  • Related