Home > Software engineering >  Filter records based on timestamp in pandas dataframe
Filter records based on timestamp in pandas dataframe

Time:09-15

I have a pandas dataframe as below:

id location timestamp
001 A 2022-01-01 17:00:00
001 B 2022-01-01 18:00:00
001 B 2022-01-01 18:15:00
002 B 2022-01-01 18:30:00
003 B 2022-01-01 19:00:00
003 A 2022-01-01 20:00:00

I'd like to filter records where location is B and only after the same id has visited location A (i.e. the timestamp of B is later than the timestamp of A). Desired output as below:

id location timestamp
001 B 2022-01-01 18:00:00
001 B 2022-01-01 18:15:00

Thank you for your help!

CodePudding user response:

You can use boolean indexing with help of groupby.cummax:

# identify all times of a visit to A and after
m1 = df.sort_values('timestamp')['location'].eq('A').groupby(df['id']).cummax()
# identify location B
m2 = df['location'].eq('B')

# keep rows where both conditions above are True
df[m1&m2]

output:

   id location            timestamp
1   1        B  2022-01-01 18:00:00
2   1        B  2022-01-01 18:15:00

CodePudding user response:

You can simply compare the dates via python comparators as shown here:

filtered_records = records[records["timestamp"] > compared_record["timestamp"]]

The Location can be filteres in the same way:

filtered_records = records[records["location"] == compared_record["location"]]

So your final code would be:

filtered_records = records[(records["timestamp"] > compared_record["timestamp"]) && (records["location"] == compared_record["location"])]
  • Related