I have a dataframe
df a b
7 2019-05-01 00:00:01
6 2019-05-02 00:15:01
1 2019-05-06 00:10:01
3 2019-05-09 01:00:01
8 2019-05-09 04:20:01
9 2019-05-12 01:10:01
4 2019-05-16 03:30:01
And
l = [datetime.datetime(2019,05,02), datetime.datetime(2019,05,10), datetime.datetime(2019,05,22) ]
I want to add a column with the following: for each row, find the last date from l that is before it, and add number of days between them. If none of the date is smaller - add the delta from the smallest one. So the new column will be:
df a b. delta
7 2019-05-01 00:00:01 -1
6 2019-05-02 00:15:01 0
1 2019-05-06 00:10:01 4
3 2019-05-09 01:00:01 7
8 2019-05-09 04:20:01 7
9 2019-05-12 01:10:01 2
4 2019-05-16 03:30:01 6
How can I do it?
Thanks
CodePudding user response:
Using merge_asof
to align df['b']
and the list (as Series), then computing the difference:
# ensure datetime
df['b'] = pd.to_datetime(df['b'])
# craft Series for merging (could be combined with line below)
s = pd.Series(l, name='l')
# merge and fillna with minimum date
ref = pd.merge_asof(df['b'], s, left_on='b', right_on='l')['l'].fillna(s.min())
# compute the delta as days
df['delta'] =(df['b']-ref).dt.days
output:
a b delta
0 7 2019-05-01 00:00:01 -1
1 6 2019-05-02 00:15:01 0
2 1 2019-05-06 00:10:01 4
3 3 2019-05-09 01:00:01 7
4 8 2019-05-09 04:20:01 7
5 9 2019-05-12 01:10:01 2
6 4 2019-05-16 03:30:01 6
CodePudding user response:
Here's a one line solution if you your b
column has datetime
object. Otherwise convert it to datetime
object.
df['delta'] = df.apply(lambda x: sorted([x.b - i for i in l], key= lambda y: y.seconds)[0].days, axis=1)
Explanation : To each row you apply a function that :
- Compute the
deltatime
between your row'sdatetime
and everydatetime
present inl
, then store it in a list - Sort this list by the numbers of seconds of each
deltatime
- Get the first value (with the smallest
deltatime
) and return itsdays
CodePudding user response:
this code is seperate this dataset on
- weekday Friday
- year 2014
- day 01
- hour 00
- minute 03
rides['weekday'] = rides.timestamp.dt.strftime("%A")
rides['year'] = rides.timestamp.dt.strftime("%Y")
rides['day'] = rides.timestamp.dt.strftime("%d")
rides['hour'] = rides.timestamp.dt.strftime("%H")
rides["minute"] = rides.timestamp.dt.strftime("%M")