Home > OS >  Filter a DataFrame based on groupby method and a column
Filter a DataFrame based on groupby method and a column

Time:11-16

I have to following DF

   symbol  cml_units  number_of_shares     price                time  gain_loss  cml_cost  cash_flow  avg_price
1    BP.L          2                 2  504.8275 2022-10-04 14:14:11       0.00   1009.65   -1009.65     504.83
3    BP.L          0                -2  504.2625 2022-10-04 14:43:18      -1.13     -0.01    1008.52       0.00
4    AAPL          0                -3  142.4500 2022-10-04 15:28:33       0.00    284.93     427.35       0.00
5    AAPL          6                 3  146.4000 2022-10-06 10:13:53       0.00   1151.51    -439.20     191.92
8    AAPL         47                47  171.5200 2022-08-18 13:45:02       0.00   8061.44   -8061.44     171.52
15   AAPL          0               -47  149.8400 2022-09-25 19:18:42   -1018.96      0.00    7042.48       0.00
20   AAPL         10                 7  140.0900 2022-10-09 13:53:05       0.00   1692.94    -980.63     169.29
22   AAPL          3                 3  142.4500 2022-10-04 09:06:15       0.00    712.31    -427.35     142.46
23   AAPL          0                 3  138.3400 2022-10-13 09:38:23     -24.18      0.00     415.02       0.00
29   AAPL          0                 7  138.3400 2022-10-13 09:38:26     -12.25      0.00     968.38       0.00
31   AAPL          5                 5  138.3400 2022-10-13 09:46:32       0.00    691.70    -691.70     138.34
38   AAPL          0                 5  150.3200 2022-11-01 18:42:08      59.90      0.00     751.60       0.00
44   AAPL          1                 1  150.2700 2022-11-01 18:42:47       0.00    150.27    -150.27     150.27
55   AAPL          0                 1  149.7000 2022-11-14 12:41:36      -0.57      0.00     149.70       0.00
66   BP.L          2                 2  562.4942 2022-10-14 12:42:48       0.00   1124.98   -1124.99     562.49
68   AAPL          2                 2  149.7000 2022-11-14 14:39:57       0.00    299.40    -299.40     149.70
70   AAPL          0                -2  148.2800 2022-11-15 09:07:41      -2.84      0.00     296.56       0.00
73   BP.L          1                -1  562.1850 2022-11-15 09:12:41      -0.31    562.49     562.18     562.49
74   AAPL          3                 3  148.2800 2022-11-15 13:14:36       0.00    444.84    -444.84     148.28

I need to filter out all the rows that are previous to the last time cml_units was 0 for each symbol.

For example on the above DF the result should be:

 symbol  cml_units  number_of_shares     price                time  gain_loss  cml_cost  cash_flow  avg_price
66   BP.L          2                 2  562.4942 2022-10-14 12:42:48       0.00   1124.98   -1124.99     562.49
73   BP.L          1                -1  562.1850 2022-11-15 09:12:41      -0.31    562.49     562.18     562.49
74   AAPL          3                 3  148.2800 2022-11-15 13:14:36       0.00    444.84    -444.84     148.28

This is because BP.L on 2022-10-14 12:42:48 was the first purchase after cml_units were 0 on the 2022-10-04 14:43:18, and AAPL on the 2022-11-15 13:14:36 was the first purchase after cml_units were 0 on the 2022-11-15 09:07:41.

This DF can be in any shape so I am trying to find an inclusive wholesome way to achieve it, even if the DF have other stocks.

CodePudding user response:

First you should sort your df by time. Then you can group and concat based on condition:

df = df.sort_values('time')

df_out = pd.DataFrame()
for sym, sub_df in df.groupby('symbol'):
    zero_dates = sub_df[(sub_df['cml_units'] == 0)]['time']
    if not zero_dates.empty:
        last_zero_date = zero_dates.values[-1]
    else:
        last_zero_date = pd.to_datetime(0)
    df_out = pd.concat([df_out, sub_df[sub_df['time'] > last_zero_date]])

print(df_out)

Edit: adding handling of cases where cml_units is always >0

Output:

   symbol  cml_units  number_of_shares     price                time  gain_loss  cml_cost  cash_flow  avg_price
id
74   AAPL          3                 3  148.2800 2022-11-15 13:14:36       0.00    444.84    -444.84     148.28
66   BP.L          2                 2  562.4942 2022-10-14 12:42:48       0.00   1124.98   -1124.99     562.49
73   BP.L          1                -1  562.1850 2022-11-15 09:12:41      -0.31    562.49     562.18     562.49
  • Related