How to delete group of rows if in certain column value doesn't met the condition?-CodePudding

For example, here is the table (df):

Number	Code	Action	DateTime
7271	1	send	2021-10-20 13:12:18
7271	1	get	2021-10-20 13:12:20
7271	1	take	2021-10-20 13:12:21
7271	1	reply	2021-10-20 13:12:25
7271	1	send:	2021-10-20 13:15:18
7271	1	get	2021-10-20 13:15:20
7271	5	take	2021-10-20 13:15:21
7271	5	reply	2021-10-20 13:15:25

and I want to delete group of rows, where in column 'Action' string "send" has no ":" (colon). I mean, I want to have a table where in column 'Action', string 'send' consist a colon and also that row's group. By row's group, I mean rows with the same Number and Datetime(in some interval). Also, Numbers can repeat but with different DateTime. There are over 100 000 data.

New table should look like this:

Number	Priority	Action	DateTime
7271	1	send:	2021-10-20 13:15:18
7271	1	get	2021-10-20 13:15:20
7271	5	take	2021-10-20 13:15:21
7271	5	reply	2021-10-20 13:15:25

P.S: string 'send' with a colon ('send:') means that a client with a certain number is replied.

CodePudding user response：

Use a boolean mask:

mask = df.loc[df['Action'].str.startswith('send'), 'Action'] \
         .str.endswith(':').reindex(df.index).ffill()

df = df[mask]
print(df)

# Output:
   Number  Code Action             DateTime
4    7271     1  send:  2021-10-20 13:15:18
5    7271     1    get  2021-10-20 13:15:20
6    7271     5   take  2021-10-20 13:15:21
7    7271     5  reply  2021-10-20 13:15:25

CodePudding user response：

You can create groups by values after send by compare with Series.str.startswith with cumualtive sum by Series.cumsum and filter only groups if send: in Series.isin:

m1 = df['Action'].str.startswith('send')
m2 = df['Action'].eq('send:')
s = m1.cumsum()

df = df[s.isin(s[m2])]
print (df)
   Number  Code Action             DateTime
4    7271     1  send:  2021-10-20 13:15:18
5    7271     1    get  2021-10-20 13:15:20
6    7271     5   take  2021-10-20 13:15:21
7    7271     5  reply  2021-10-20 13:15:25