Finding last occurrence of a particular category within each group and Filter out rows-Pandas-CodePudding

I have a dataset as below:

data = [[1,'bot', 'a'], [1,'cust', 'b'], [1,'bot', 'c'],[1,'cust', 'd'],[1,'agent', 'e'],[1,'cust', 'f'],
       [2,'bot', 'a'],[2,'cust', 'b'],[2,'bot', 'c'],[2,'bot', 'd'],[2,'agent', 'e'],[2,'cust', 'f'],[2,'agent', 'g'],
       [3,'cust', 'h'],[3,'cust', 'i'],[3,'agent', 'k'],[3,'agent', 'l']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'sender','text'])
df

I want to remove out filter out records under each id group for a specific category(sender). For example, if I want to filter out 'bot' category, I need to find last bot category occurrence under each group(id) and delete records prior to that occurrence.

Expected output

Tried various approaches with groupby functionality but not getting intented output. Any pointers would be quite helpful

CodePudding user response：

You can use a reverse groupby.cummin for boolean indexing:


m = df.loc[::-1,'sender'].ne('bot').groupby(df['id']).cummin()

out = df[m]

Output:

    id sender text
3    1   cust    d
4    1  agent    e
5    1   cust    f
10   2  agent    e
11   2   cust    f
12   2  agent    g
13   3   cust    h
14   3   cust    i
15   3  agent    k
16   3  agent    l