Home > Net >  Filter pandas dataframe using combination of id and before a certain datetime (by id)
Filter pandas dataframe using combination of id and before a certain datetime (by id)

Time:03-29

I have a dataframe like the following, with ids, timestamps where an observation was made, and a check for a certain condition (and other information):

id     ds                      condYN     otherinfo    
 1      2146-06-03 13:41:00     Y          blah
 1      2151-11-24 01:39:00     Y          etc
 2      2147-06-05 15:59:00     Y          etc
 3      2194-06-13 18:39:00     N          etc
 3      2196-09-27 18:21:00     Y          etc
 3      2196-10-27 12:20:00     Y          etc
 4      2196-11-27 11:20:00     N          etc

I want to filter the dataset such that only people (identified by id) that have had the condition (condYN = "Y") are listed, with all observations up to and including the first observation of the condition listed:

id     ds                      condYN     otherinfo    
 1      2146-06-03 13:41:00     Y          blah
 2      2147-06-05 15:59:00     Y          etc
 3      2194-06-13 18:39:00     N          etc
 3      2196-09-27 18:21:00     Y          etc
 

I have a list of tuples, by ID and datestamp of first time the condition was found, for each person):

[(1,2146-06-03 13:41:00),(2,2147-06-05 15:59:00),(3,2196-09-27 18:21:00)]

But am unsure how to do such a filter on the dataset (matches the first item, <= the second item).

CodePudding user response:

You can create a pd.Series from your list of tuples, with ids as index and dates as values. Then map the id column of your original dataframe with this series and compare to the ds column. Use this mask in a loc to select the wanted rows.

l = [(1,'2146-06-03 13:41:00'),(2,'2147-06-05 15:59:00'),(3,'2196-09-27 18:21:00')]

s = pd.Series(map(lambda x: x[1], l), map(lambda x: x[0], l))
print(s)
# 1    2146-06-03 13:41:00
# 2    2147-06-05 15:59:00
# 3    2196-09-27 18:21:00
# dtype: object

res = df.loc[df['ds']<=df['id'].map(s)]
print(res)
#    id                  ds condYN otherinfo
# 0   1 2146-06-03 13:41:00      Y      blah
# 2   2 2147-06-05 15:59:00      Y       etc
# 3   3 2194-06-13 18:39:00      N       etc
# 4   3 2196-09-27 18:21:00      Y       etc
  • Related