,clock_now,competition,market_name,lay,per_day
0,2022-12-30,A,B,-1.0,-1.0
1,2022-12-31,A,B,1.28,0.28
2,2023-01-01,A,B,-1.0,-0.72
3,2023-01-02,A,B,1.0,0.28
4,2023-01-03,A,B,1.0,1.28
5,2023-01-04,A,B,-1.0,-1.72
6,2023-01-04,A,B,-1.0,-1.72
7,2023-01-04,A,B,-1.0,-1.72
The idea is to get the value of the per_day
column of the previous date closest to the row being analyzed.
For example:
In the lines with the date 2023-01-04
, check in any of the lines that have the date 2023-01-03
which is the value of the column per_day
, if it is greater than zero, True, if it is less or equal to zero, False.
The list would look like this:
False
False
True
False
True
True
True
True
My attempt:
df.clock_now = pd.to_datetime(df.clock_now)
df['invest'] = np.where(df.loc[df.clock_now == df['clock_now'] - timedelta(days=1),'per_day'].values[0] > 0,True,False)
But they all return False
and there is another problem, it is not sure that the date will always be 1
day ago, it could be 2
or more, so it would still be a failed option.
How should I proceed in this case?
CodePudding user response:
Use groupby
to group together rows with the same date, aggregate with last
to get last value, then shift
by one to assign each date the value corresponding to previous date, then join
back with original dataframe:
df.join(df.groupby('clock_now').last()['per_day'].shift(1).rename('prev_day') > 0, on='clock_now')
This will output
Unnamed: 0 clock_now competition market_name lay per_day prev_day
0 0 2022-12-30 A B -1.00 -1.00 False
1 1 2022-12-31 A B 1.28 0.28 False
2 2 2023-01-01 A B -1.00 -0.72 True
3 3 2023-01-02 A B 1.00 0.28 False
4 4 2023-01-03 A B 1.00 1.28 True
5 5 2023-01-04 A B -1.00 -1.72 True
6 6 2023-01-04 A B -1.00 -1.72 True
7 7 2023-01-04 A B -1.00 -1.72 True