I have to following DF
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
1 BP.L 2 2 504.8275 2022-10-04 14:14:11 0.00 1009.65 -1009.65 504.83
3 BP.L 0 -2 504.2625 2022-10-04 14:43:18 -1.13 -0.01 1008.52 0.00
4 AAPL 0 -3 142.4500 2022-10-04 15:28:33 0.00 284.93 427.35 0.00
5 AAPL 6 3 146.4000 2022-10-06 10:13:53 0.00 1151.51 -439.20 191.92
8 AAPL 47 47 171.5200 2022-08-18 13:45:02 0.00 8061.44 -8061.44 171.52
15 AAPL 0 -47 149.8400 2022-09-25 19:18:42 -1018.96 0.00 7042.48 0.00
20 AAPL 10 7 140.0900 2022-10-09 13:53:05 0.00 1692.94 -980.63 169.29
22 AAPL 3 3 142.4500 2022-10-04 09:06:15 0.00 712.31 -427.35 142.46
23 AAPL 0 3 138.3400 2022-10-13 09:38:23 -24.18 0.00 415.02 0.00
29 AAPL 0 7 138.3400 2022-10-13 09:38:26 -12.25 0.00 968.38 0.00
31 AAPL 5 5 138.3400 2022-10-13 09:46:32 0.00 691.70 -691.70 138.34
38 AAPL 0 5 150.3200 2022-11-01 18:42:08 59.90 0.00 751.60 0.00
44 AAPL 1 1 150.2700 2022-11-01 18:42:47 0.00 150.27 -150.27 150.27
55 AAPL 0 1 149.7000 2022-11-14 12:41:36 -0.57 0.00 149.70 0.00
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
68 AAPL 2 2 149.7000 2022-11-14 14:39:57 0.00 299.40 -299.40 149.70
70 AAPL 0 -2 148.2800 2022-11-15 09:07:41 -2.84 0.00 296.56 0.00
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
I need to filter out all the rows that are previous to the last time cml_units was 0 for each symbol.
For example on the above DF the result should be:
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
This is because BP.L on 2022-10-14 12:42:48 was the first purchase after cml_units were 0 on the 2022-10-04 14:43:18, and AAPL on the 2022-11-15 13:14:36 was the first purchase after cml_units were 0 on the 2022-11-15 09:07:41.
This DF can be in any shape so I am trying to find an inclusive wholesome way to achieve it, even if the DF have other stocks.
CodePudding user response:
First you should sort your df by time
. Then you can group and concat based on condition:
df = df.sort_values('time')
df_out = pd.DataFrame()
for sym, sub_df in df.groupby('symbol'):
zero_dates = sub_df[(sub_df['cml_units'] == 0)]['time']
if not zero_dates.empty:
last_zero_date = zero_dates.values[-1]
else:
last_zero_date = pd.to_datetime(0)
df_out = pd.concat([df_out, sub_df[sub_df['time'] > last_zero_date]])
print(df_out)
Edit: adding handling of cases where cml_units
is always >0
Output:
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
id
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49