Home > database >  how to group by trade and not by time
how to group by trade and not by time

Time:11-12

                                   open     high     low       close    Volume
ctime                                                   
2022-11-07 01:00:00.012000 01:00  20900.0  20900.0  20900.0  20900.0     209.00
2022-11-07 01:00:00.019000 01:00  20900.1  20900.1  20900.1  20900.1    1254.00
2022-11-07 01:00:00.111000 01:00  20900.0  20900.0  20900.0  20900.0   11390.50
2022-11-07 01:00:00.188000 01:00  20900.1  20900.1  20900.1  20900.1      20.90
2022-11-07 01:00:00.194000 01:00  20900.0  20900.0  20900.0  20900.0    4995.10
2022-11-07 01:00:00.203000 01:00  20900.0  20900.0  20900.0  20900.0     209.00
2022-11-07 01:00:00.217000 01:00  20900.0  20900.0  20900.0  20900.0  239827.50
2022-11-07 01:00:00.237000 01:00  20900.0  20900.0  20900.0  20900.0      20.90
2022-11-07 01:00:03.028000 01:00  20900.0  20900.0  20900.0  20900.0      20.90
2022-11-07 01:00:03.743000 01:00  20900.0  20900.0  20900.0  20900.0      41.80

I need to gather in periods of 5 trades, that is to say that the open is the first row, that the close is the 5th trade and the minimum and the maximum is the corresponding minimum and maximum of the 5 trades, the same for the volume. there should be only 2 rows left in the exposed example once the transformation is done.

I have tried searching pandas for an approximation but I have not found a satisfactory solution

CodePudding user response:

Define a new series that put the trades into groups of 5, then aggregate per group:

trade_group = np.arange(len(df)) // 5

# The `open` of the group is the `open` of the first trade in the group
# `high` is max of the `high` in the group
# and so on...
df.groupby(trade_group).agg({
    "open": "first",
    "high": "max",
    "low": "min",
    "close": "last",
    "Volume": "sum"
})

CodePudding user response:

You can use slices on a pandas table to grab 5 items at a time. First, create a for loop starting at zero and counting by 5 up to the size of your dataframe. Then use .iloc to select a subset of the dataframe by index location. With the subset, create a new row with all of the min/max calculations you're interested in. Last, create a new dataframe from the rows.

rows = []
for start in range(0, len(df), 5):
    sub = df.iloc[start:start   5]
    rows.append([
        sub.ctime.min(),   # open time
        sub.ctime.max(),   # close time
        sub.open.min(),    # min price
        sub.close.max(),   # max price
        sub.Volume.min(),  # min volume
        sub.Volume.max(),  # max volume
    ])
summary = pd.DataFrame(rows, columns=['Open', 'Close', 'Min', 'Max', 'MinVol', 'MaxVol'])

This gives the output:

  Open                       Close                      Min     Max   MinVol  MaxVol
0 2022-11-07 01:00:00.012000 2022-11-07 01:00:00.194000 20900.0 20900.1 20.9 11390.5
1 2022-11-07 01:00:00.203000 2022-11-07 01:00:03.743000 20900.0 20900.0 20.9 39827.5
  • Related