Pandas group rows by 1 second interval then find the highest/lowest/earliest/last value-CodePudding

I have a data frame read from a csv file. something like this:

TIMESTAMP	PRICE
2022-08-01 09:30:00.005037	151.405000
2022-08-01 09:30:00.005038	155.405000
2022-08-01 09:30:00.005040	153.405000
2022-08-01 09:31:00.005000	161.405000
2022-08-01 09:31:00.006038	165.405000
2022-08-01 09:31:00.007038	163.405000

The desiable result is to group by 1 second, and find the lowest higest and earliest and last values

TIMESTAMP	LOWEST_PRICE	HIGHEST_PRICE	EARLIEST_PRICE	LAST_PRICE
2022-08-01 09:30:00.000000	151.405000	155.405000	151.405000	153.405000
2022-08-01 09:31:00.000000	161.405000	165.405000	161.405000	163.405000

I have used:

df.resample('1S').agg(['min','max', 'first', 'last'])

but I am keep getting index error, and I don't know how to add a new colounm into the data frame. could it be possible through a function?

CodePudding user response：

With pandas.Grouper object and managing column index:

df.groupby(pd.Grouper(key='TIMESTAMP', freq='1S')).agg(['min','max', 'first', 'last'])\
    .dropna().droplevel(0, axis=1)\
    .rename(columns={'min': 'LOWEST_PRICE','max': 'HIGHEST_PRICE',
                     'first': 'EARLIEST_PRICE', 'last':'LAST_PRICE'}).reset_index()

print(df)

The output:

            TIMESTAMP  LOWEST_PRICE  HIGHEST_PRICE  EARLIEST_PRICE  LAST_PRICE
0 2022-08-01 09:30:00       151.405        155.405         151.405     153.405
1 2022-08-01 09:31:00       161.405        165.405         161.405     163.405