I have a dataset as below:
data = [[1,'bot', 'a'], [1,'cust', 'b'], [1,'bot', 'c'],[1,'cust', 'd'],[1,'agent', 'e'],[1,'cust', 'f'],
[2,'bot', 'a'],[2,'cust', 'b'],[2,'bot', 'c'],[2,'bot', 'd'],[2,'agent', 'e'],[2,'cust', 'f'],[2,'agent', 'g'],
[3,'cust', 'h'],[3,'cust', 'i'],[3,'agent', 'k'],[3,'agent', 'l']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'sender','text'])
df
I want to remove out filter out records under each id group for a specific category(sender). For example, if I want to filter out 'bot' category, I need to find last bot category occurrence under each group(id) and delete records prior to that occurrence.
Expected output
Tried various approaches with groupby functionality but not getting intented output. Any pointers would be quite helpful
CodePudding user response:
You can use a reverse groupby.cummin
for boolean indexing:
m = df.loc[::-1,'sender'].ne('bot').groupby(df['id']).cummin()
out = df[m]
Output:
id sender text
3 1 cust d
4 1 agent e
5 1 cust f
10 2 agent e
11 2 cust f
12 2 agent g
13 3 cust h
14 3 cust i
15 3 agent k
16 3 agent l