I have a dataframe like the following, with ids, timestamps where an observation was made, and a check for a certain condition (and other information):
id ds condYN otherinfo
1 2146-06-03 13:41:00 Y blah
1 2151-11-24 01:39:00 Y etc
2 2147-06-05 15:59:00 Y etc
3 2194-06-13 18:39:00 N etc
3 2196-09-27 18:21:00 Y etc
3 2196-10-27 12:20:00 Y etc
4 2196-11-27 11:20:00 N etc
I want to filter the dataset such that only people (identified by id) that have had the condition (condYN = "Y") are listed, with all observations up to and including the first observation of the condition listed:
id ds condYN otherinfo
1 2146-06-03 13:41:00 Y blah
2 2147-06-05 15:59:00 Y etc
3 2194-06-13 18:39:00 N etc
3 2196-09-27 18:21:00 Y etc
I have a list of tuples, by ID and datestamp of first time the condition was found, for each person):
[(1,2146-06-03 13:41:00),(2,2147-06-05 15:59:00),(3,2196-09-27 18:21:00)]
But am unsure how to do such a filter on the dataset (matches the first item, <= the second item).
CodePudding user response:
You can create a pd.Series
from your list of tuples, with ids as index and dates as values. Then map
the id column of your original dataframe with this series and compare to the ds column. Use this mask in a loc
to select the wanted rows.
l = [(1,'2146-06-03 13:41:00'),(2,'2147-06-05 15:59:00'),(3,'2196-09-27 18:21:00')]
s = pd.Series(map(lambda x: x[1], l), map(lambda x: x[0], l))
print(s)
# 1 2146-06-03 13:41:00
# 2 2147-06-05 15:59:00
# 3 2196-09-27 18:21:00
# dtype: object
res = df.loc[df['ds']<=df['id'].map(s)]
print(res)
# id ds condYN otherinfo
# 0 1 2146-06-03 13:41:00 Y blah
# 2 2 2147-06-05 15:59:00 Y etc
# 3 3 2194-06-13 18:39:00 N etc
# 4 3 2196-09-27 18:21:00 Y etc