Home > Software design >  Pandas group rows by 1 second interval then find the highest/lowest/earliest/last value
Pandas group rows by 1 second interval then find the highest/lowest/earliest/last value

Time:12-25

I have a data frame read from a csv file. something like this:

TIMESTAMP PRICE
2022-08-01 09:30:00.005037 151.405000
2022-08-01 09:30:00.005038 155.405000
2022-08-01 09:30:00.005040 153.405000
2022-08-01 09:31:00.005000 161.405000
2022-08-01 09:31:00.006038 165.405000
2022-08-01 09:31:00.007038 163.405000

The desiable result is to group by 1 second, and find the lowest higest and earliest and last values

TIMESTAMP LOWEST_PRICE HIGHEST_PRICE EARLIEST_PRICE LAST_PRICE
2022-08-01 09:30:00.000000 151.405000 155.405000 151.405000 153.405000
2022-08-01 09:31:00.000000 161.405000 165.405000 161.405000 163.405000

I have used:

df.resample('1S').agg(['min','max', 'first', 'last'])

but I am keep getting index error, and I don't know how to add a new colounm into the data frame. could it be possible through a function?

CodePudding user response:

With pandas.Grouper object and managing column index:

df.groupby(pd.Grouper(key='TIMESTAMP', freq='1S')).agg(['min','max', 'first', 'last'])\
    .dropna().droplevel(0, axis=1)\
    .rename(columns={'min': 'LOWEST_PRICE','max': 'HIGHEST_PRICE',
                     'first': 'EARLIEST_PRICE', 'last':'LAST_PRICE'}).reset_index()

print(df)

The output:

            TIMESTAMP  LOWEST_PRICE  HIGHEST_PRICE  EARLIEST_PRICE  LAST_PRICE
0 2022-08-01 09:30:00       151.405        155.405         151.405     153.405
1 2022-08-01 09:31:00       161.405        165.405         161.405     163.405
  • Related