open high low close Volume
ctime
2022-11-07 01:00:00.012000 01:00 20900.0 20900.0 20900.0 20900.0 209.00
2022-11-07 01:00:00.019000 01:00 20900.1 20900.1 20900.1 20900.1 1254.00
2022-11-07 01:00:00.111000 01:00 20900.0 20900.0 20900.0 20900.0 11390.50
2022-11-07 01:00:00.188000 01:00 20900.1 20900.1 20900.1 20900.1 20.90
2022-11-07 01:00:00.194000 01:00 20900.0 20900.0 20900.0 20900.0 4995.10
2022-11-07 01:00:00.203000 01:00 20900.0 20900.0 20900.0 20900.0 209.00
2022-11-07 01:00:00.217000 01:00 20900.0 20900.0 20900.0 20900.0 239827.50
2022-11-07 01:00:00.237000 01:00 20900.0 20900.0 20900.0 20900.0 20.90
2022-11-07 01:00:03.028000 01:00 20900.0 20900.0 20900.0 20900.0 20.90
2022-11-07 01:00:03.743000 01:00 20900.0 20900.0 20900.0 20900.0 41.80
I need to gather in periods of 5 trades, that is to say that the open is the first row, that the close is the 5th trade and the minimum and the maximum is the corresponding minimum and maximum of the 5 trades, the same for the volume. there should be only 2 rows left in the exposed example once the transformation is done.
I have tried searching pandas for an approximation but I have not found a satisfactory solution
CodePudding user response:
Define a new series that put the trades into groups of 5, then aggregate per group:
trade_group = np.arange(len(df)) // 5
# The `open` of the group is the `open` of the first trade in the group
# `high` is max of the `high` in the group
# and so on...
df.groupby(trade_group).agg({
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"Volume": "sum"
})
CodePudding user response:
You can use slices on a pandas table to grab 5 items at a time. First, create a for loop starting at zero and counting by 5 up to the size of your dataframe. Then use .iloc
to select a subset of the dataframe by index location. With the subset, create a new row with all of the min/max calculations you're interested in. Last, create a new dataframe from the rows.
rows = []
for start in range(0, len(df), 5):
sub = df.iloc[start:start 5]
rows.append([
sub.ctime.min(), # open time
sub.ctime.max(), # close time
sub.open.min(), # min price
sub.close.max(), # max price
sub.Volume.min(), # min volume
sub.Volume.max(), # max volume
])
summary = pd.DataFrame(rows, columns=['Open', 'Close', 'Min', 'Max', 'MinVol', 'MaxVol'])
This gives the output:
Open Close Min Max MinVol MaxVol
0 2022-11-07 01:00:00.012000 2022-11-07 01:00:00.194000 20900.0 20900.1 20.9 11390.5
1 2022-11-07 01:00:00.203000 2022-11-07 01:00:03.743000 20900.0 20900.0 20.9 39827.5