I'm using the following script to take a csv with minute data and perform some operations like adding new data columns with calculations.
As an edit to include csv data as mentioned in comments:
2022/11/05 23:40:02,1000.13575463,0.02115
2022/11/05 23:41:01,1000.13575463,0.0147
2022/11/05 23:42:02,1000.13573563,0.01058
2022/11/05 23:43:02,1000.13573563,0.00892
2022/11/05 23:44:02,1000.13573563,0.01975
2022/11/05 23:45:02,1000.13573563,0.01291
2022/11/05 23:46:02,1000.13573563,0.00117
2022/11/05 23:47:02,1000.13573563,0.00728
2022/11/05 23:48:02,1000.13573563,0.0093
2022/11/05 23:49:02,1000.13573563,-4e-05
2022/11/05 23:50:02,1000.13567035,0.00603
2022/11/05 23:51:02,1000.13567035,0.01651
2022/11/05 23:52:02,1000.13567035,0.02454
2022/11/05 23:53:03,1000.13567035,0.02232
2022/11/05 23:54:02,1000.13567035,0.02159
2022/11/05 23:55:02,1000.13567035,0.02388
2022/11/05 23:56:01,1000.13565737,0.02777
2022/11/05 23:57:02,1000.13565737,0.01263
2022/11/05 23:58:02,1000.13565737,0.01004
2022/11/05 23:59:02,1000.13561092,0.0137
2022/11/06 00:00:03,1000.13250552,0.014
2022/11/06 00:01:02,1000.1304933,0.0196
2022/11/06 00:02:02,1000.13048128,0.02217
2022/11/06 00:03:02,1000.13045553,0.0278
2022/11/06 00:04:02,1000.13044246,0.03552
I'm trying to resample this dataframe to days rather than minutes so I can play about with the data in different ways, like plotting it using daily data rather than minute, and using the data to work out total days traded etc.
Unfortunately I get this error as soon as the first new column operation comes along : TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'
.
The new columns don't accept the resampled dataframe but I'm sure I'm doing something wrong to cause this.
import pandas as pd
import numpy as np
df = pd.read_csv(r'/home/me/mydata.csv')
df['date_time'] = pd.to_datetime(df['date_time'],format="%Y/%m/%d %H:%M:%S",infer_datetime_format=True)
pd.options.display.max_columns = None
pd.options.display.width = None
df = df.resample('D', on='date_time')
df['draw_down'] = (df.upnl / df.wallet_balance * 100).fillna(0)
df['pnl'] = df.wallet_balance.diff(1).fillna(0)
df["pnl_pct"] = df.wallet_balance.pct_change(1).fillna(0) * 100
df['cum_pnl'] = df.pnl.cumsum(skipna=True)
df['cum_pnl_pct'] = df.pnl_pct.cumsum(skipna=True)
print(df)
CodePudding user response:
As the error says (and the comments imply), .resample()
produces an groupby
-like object that is ready to perform reducing operations, such as sum()
, mean()
, size()
, count()
, etc. Without one of these operations, the resampling operation is undefined.
In other words, what you are doing is similar to:
gb = df.groupby(...)
gb['A'] / gb['B'] # wrong
instead of, for example:
gb['A'].sum() / gb['B'].median()
See pandas resampling user guide for a better explanation and some examples.
CodePudding user response:
In a round about way after some testing I discovered that what I was trying to ask wasn't clear, my purpose for resampling was to get the last row of each day so that I could get the final values for that time period and plot or calculate from there on.
Therefore I searched and found this code which suits my requirements:
df = df.groupby(df['date_time'].dt.date).tail(1)
The output is this:
59 2022-11-05 23:59:02 1000.135611 0.013700 0.001370 -0.000046 -0.000005 0.143291 0.014329
1201 2022-11-06 19:01:02 1001.460683 -1.082203 -0.108062 0.000000 0.000000 1.468363 0.146737
My data set isn't large right now but I believe it will work as I need it to with many days data.