I have a data frame read from a csv file. something like this:
TIMESTAMP | PRICE |
---|---|
2022-08-01 09:30:00.005037 | 151.405000 |
2022-08-01 09:30:00.005038 | 155.405000 |
2022-08-01 09:30:00.005040 | 153.405000 |
2022-08-01 09:31:00.005000 | 161.405000 |
2022-08-01 09:31:00.006038 | 165.405000 |
2022-08-01 09:31:00.007038 | 163.405000 |
The desiable result is to group by 1 second, and find the lowest higest and earliest and last values
TIMESTAMP | LOWEST_PRICE | HIGHEST_PRICE | EARLIEST_PRICE | LAST_PRICE |
---|---|---|---|---|
2022-08-01 09:30:00.000000 | 151.405000 | 155.405000 | 151.405000 | 153.405000 |
2022-08-01 09:31:00.000000 | 161.405000 | 165.405000 | 161.405000 | 163.405000 |
I have used:
df.resample('1S').agg(['min','max', 'first', 'last'])
but I am keep getting index error, and I don't know how to add a new colounm into the data frame. could it be possible through a function?
CodePudding user response:
With pandas.Grouper
object and managing column index:
df.groupby(pd.Grouper(key='TIMESTAMP', freq='1S')).agg(['min','max', 'first', 'last'])\
.dropna().droplevel(0, axis=1)\
.rename(columns={'min': 'LOWEST_PRICE','max': 'HIGHEST_PRICE',
'first': 'EARLIEST_PRICE', 'last':'LAST_PRICE'}).reset_index()
print(df)
The output:
TIMESTAMP LOWEST_PRICE HIGHEST_PRICE EARLIEST_PRICE LAST_PRICE
0 2022-08-01 09:30:00 151.405 155.405 151.405 153.405
1 2022-08-01 09:31:00 161.405 165.405 161.405 163.405