Home > Net >  Finding last occurrence of a particular category within each group and Filter out rows-Pandas
Finding last occurrence of a particular category within each group and Filter out rows-Pandas

Time:08-19

I have a dataset as below:

data = [[1,'bot', 'a'], [1,'cust', 'b'], [1,'bot', 'c'],[1,'cust', 'd'],[1,'agent', 'e'],[1,'cust', 'f'],
       [2,'bot', 'a'],[2,'cust', 'b'],[2,'bot', 'c'],[2,'bot', 'd'],[2,'agent', 'e'],[2,'cust', 'f'],[2,'agent', 'g'],
       [3,'cust', 'h'],[3,'cust', 'i'],[3,'agent', 'k'],[3,'agent', 'l']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'sender','text'])
df

enter image description here

I want to remove out filter out records under each id group for a specific category(sender). For example, if I want to filter out 'bot' category, I need to find last bot category occurrence under each group(id) and delete records prior to that occurrence.

Expected output

enter image description here

Tried various approaches with groupby functionality but not getting intented output. Any pointers would be quite helpful

CodePudding user response:

You can use a reverse groupby.cummin for boolean indexing:


m = df.loc[::-1,'sender'].ne('bot').groupby(df['id']).cummin()

out = df[m]

Output:

    id sender text
3    1   cust    d
4    1  agent    e
5    1   cust    f
10   2  agent    e
11   2   cust    f
12   2  agent    g
13   3   cust    h
14   3   cust    i
15   3  agent    k
16   3  agent    l
  • Related