Home > other >  How to delete group of rows if in certain column value doesn't met the condition?
How to delete group of rows if in certain column value doesn't met the condition?

Time:10-21

For example, here is the table (df):

Number Code Action DateTime
7271 1 send 2021-10-20 13:12:18
7271 1 get 2021-10-20 13:12:20
7271 1 take 2021-10-20 13:12:21
7271 1 reply 2021-10-20 13:12:25
7271 1 send: 2021-10-20 13:15:18
7271 1 get 2021-10-20 13:15:20
7271 5 take 2021-10-20 13:15:21
7271 5 reply 2021-10-20 13:15:25

and I want to delete group of rows, where in column 'Action' string "send" has no ":" (colon). I mean, I want to have a table where in column 'Action', string 'send' consist a colon and also that row's group. By row's group, I mean rows with the same Number and Datetime(in some interval). Also, Numbers can repeat but with different DateTime. There are over 100 000 data.

New table should look like this:

Number Priority Action DateTime
7271 1 send: 2021-10-20 13:15:18
7271 1 get 2021-10-20 13:15:20
7271 5 take 2021-10-20 13:15:21
7271 5 reply 2021-10-20 13:15:25

P.S: string 'send' with a colon ('send:') means that a client with a certain number is replied.

CodePudding user response:

Use a boolean mask:

mask = df.loc[df['Action'].str.startswith('send'), 'Action'] \
         .str.endswith(':').reindex(df.index).ffill()

df = df[mask]
print(df)

# Output:
   Number  Code Action             DateTime
4    7271     1  send:  2021-10-20 13:15:18
5    7271     1    get  2021-10-20 13:15:20
6    7271     5   take  2021-10-20 13:15:21
7    7271     5  reply  2021-10-20 13:15:25

CodePudding user response:

You can create groups by values after send by compare with Series.str.startswith with cumualtive sum by Series.cumsum and filter only groups if send: in Series.isin:

m1 = df['Action'].str.startswith('send')
m2 = df['Action'].eq('send:')
s = m1.cumsum()

df = df[s.isin(s[m2])]
print (df)
   Number  Code Action             DateTime
4    7271     1  send:  2021-10-20 13:15:18
5    7271     1    get  2021-10-20 13:15:20
6    7271     5   take  2021-10-20 13:15:21
7    7271     5  reply  2021-10-20 13:15:25
  • Related