Home > Back-end >  Is there a way to resample a pandas dataframe with minute data to daily data?
Is there a way to resample a pandas dataframe with minute data to daily data?

Time:11-07

I'm using the following script to take a csv with minute data and perform some operations like adding new data columns with calculations.

As an edit to include csv data as mentioned in comments:

2022/11/05 23:40:02,1000.13575463,0.02115
2022/11/05 23:41:01,1000.13575463,0.0147
2022/11/05 23:42:02,1000.13573563,0.01058
2022/11/05 23:43:02,1000.13573563,0.00892
2022/11/05 23:44:02,1000.13573563,0.01975
2022/11/05 23:45:02,1000.13573563,0.01291
2022/11/05 23:46:02,1000.13573563,0.00117
2022/11/05 23:47:02,1000.13573563,0.00728
2022/11/05 23:48:02,1000.13573563,0.0093
2022/11/05 23:49:02,1000.13573563,-4e-05
2022/11/05 23:50:02,1000.13567035,0.00603
2022/11/05 23:51:02,1000.13567035,0.01651
2022/11/05 23:52:02,1000.13567035,0.02454
2022/11/05 23:53:03,1000.13567035,0.02232
2022/11/05 23:54:02,1000.13567035,0.02159
2022/11/05 23:55:02,1000.13567035,0.02388
2022/11/05 23:56:01,1000.13565737,0.02777
2022/11/05 23:57:02,1000.13565737,0.01263
2022/11/05 23:58:02,1000.13565737,0.01004
2022/11/05 23:59:02,1000.13561092,0.0137
2022/11/06 00:00:03,1000.13250552,0.014
2022/11/06 00:01:02,1000.1304933,0.0196
2022/11/06 00:02:02,1000.13048128,0.02217
2022/11/06 00:03:02,1000.13045553,0.0278
2022/11/06 00:04:02,1000.13044246,0.03552

I'm trying to resample this dataframe to days rather than minutes so I can play about with the data in different ways, like plotting it using daily data rather than minute, and using the data to work out total days traded etc.

Unfortunately I get this error as soon as the first new column operation comes along : TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'.

The new columns don't accept the resampled dataframe but I'm sure I'm doing something wrong to cause this.

import pandas as pd
import numpy as np

df = pd.read_csv(r'/home/me/mydata.csv')
df['date_time'] = pd.to_datetime(df['date_time'],format="%Y/%m/%d %H:%M:%S",infer_datetime_format=True)

pd.options.display.max_columns = None
pd.options.display.width = None

df = df.resample('D', on='date_time')

df['draw_down'] = (df.upnl / df.wallet_balance * 100).fillna(0)
df['pnl'] = df.wallet_balance.diff(1).fillna(0)
df["pnl_pct"] = df.wallet_balance.pct_change(1).fillna(0) * 100
df['cum_pnl'] = df.pnl.cumsum(skipna=True)
df['cum_pnl_pct'] = df.pnl_pct.cumsum(skipna=True)

print(df)

CodePudding user response:

As the error says (and the comments imply), .resample() produces an groupby-like object that is ready to perform reducing operations, such as sum(), mean(), size(), count(), etc. Without one of these operations, the resampling operation is undefined.

In other words, what you are doing is similar to:

gb = df.groupby(...)
gb['A'] / gb['B']  # wrong

instead of, for example:

gb['A'].sum() / gb['B'].median()

See pandas resampling user guide for a better explanation and some examples.

CodePudding user response:

In a round about way after some testing I discovered that what I was trying to ask wasn't clear, my purpose for resampling was to get the last row of each day so that I could get the final values for that time period and plot or calculate from there on.

Therefore I searched and found this code which suits my requirements:

df = df.groupby(df['date_time'].dt.date).tail(1)

The output is this:

59   2022-11-05 23:59:02     1000.135611  0.013700   0.001370 -0.000046 -0.000005  0.143291     0.014329
1201 2022-11-06 19:01:02     1001.460683 -1.082203  -0.108062  0.000000  0.000000  1.468363     0.146737

My data set isn't large right now but I believe it will work as I need it to with many days data.

  • Related