For example, here is the table (df):
Number | Code | Action | DateTime |
---|---|---|---|
7271 | 1 | send | 2021-10-20 13:12:18 |
7271 | 1 | get | 2021-10-20 13:12:20 |
7271 | 1 | take | 2021-10-20 13:12:21 |
7271 | 1 | reply | 2021-10-20 13:12:25 |
7271 | 1 | send: | 2021-10-20 13:15:18 |
7271 | 1 | get | 2021-10-20 13:15:20 |
7271 | 5 | take | 2021-10-20 13:15:21 |
7271 | 5 | reply | 2021-10-20 13:15:25 |
and I want to delete group of rows, where in column 'Action' string "send" has no ":" (colon). I mean, I want to have a table where in column 'Action', string 'send' consist a colon and also that row's group. By row's group, I mean rows with the same Number and Datetime(in some interval). Also, Numbers can repeat but with different DateTime. There are over 100 000 data.
New table should look like this:
Number | Priority | Action | DateTime |
---|---|---|---|
7271 | 1 | send: | 2021-10-20 13:15:18 |
7271 | 1 | get | 2021-10-20 13:15:20 |
7271 | 5 | take | 2021-10-20 13:15:21 |
7271 | 5 | reply | 2021-10-20 13:15:25 |
P.S: string 'send' with a colon ('send:') means that a client with a certain number is replied.
CodePudding user response:
Use a boolean mask:
mask = df.loc[df['Action'].str.startswith('send'), 'Action'] \
.str.endswith(':').reindex(df.index).ffill()
df = df[mask]
print(df)
# Output:
Number Code Action DateTime
4 7271 1 send: 2021-10-20 13:15:18
5 7271 1 get 2021-10-20 13:15:20
6 7271 5 take 2021-10-20 13:15:21
7 7271 5 reply 2021-10-20 13:15:25
CodePudding user response:
You can create groups by values after send
by compare with Series.str.startswith
with cumualtive sum by Series.cumsum
and filter only groups if send:
in Series.isin
:
m1 = df['Action'].str.startswith('send')
m2 = df['Action'].eq('send:')
s = m1.cumsum()
df = df[s.isin(s[m2])]
print (df)
Number Code Action DateTime
4 7271 1 send: 2021-10-20 13:15:18
5 7271 1 get 2021-10-20 13:15:20
6 7271 5 take 2021-10-20 13:15:21
7 7271 5 reply 2021-10-20 13:15:25