Get the value corresponding to the start of a given category created with datetime64[ns]-CodePudding

I have a DataFrame that is a time series every minutes (duration: up to today) All ts are in datetime64[ns]

Input:

timestamp                   A       
2021-01-01 00:00:00.000     2000   
2021-01-01 00:01:00.000     2005   
2021-01-01 00:02:00.000     2010      
...

I create a new column with categorical data consisting of periods of 40 days like below: ( I am using pd.cut) , output is:

timestamp                   A        period 
2021-01-01 00:00:00.000     2000     (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:01:00.000     2005     (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:02:00.000     2010     (2021-12-31 23:59:59.999999999, 2022-02-10]
...

The set of period is as below:

[(2021-12-31 23:59:59.999999999, 2022-02-10] < (2022-02-10, 2022-03-22] < (2022-03-22, 2022-05-01]]

I want to create a new column in which I get the first value of this category For example:

timestamp                   A        period                                       A_ref
2021-01-01 00:00:00.000     2000     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
2021-01-01 00:01:00.000     2005     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
2021-01-01 00:02:00.000     2010     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
...

It could be that the beginning of the period is not an exact match.

What I have tried: df.resample('40D') taking 'first' as aggregator. Doing this I can get the value I am interested in but I really want is fill this value across all of the rows. Maybe the approach of using a category is not necessary?

Thank you

CodePudding user response：

df.merge(df.groupby('period')['A'].first().reset_index(), on='period', suffixes = ('', '_ref'))

                 timestamp  ...  A_ref
0  2021-01-01 00:00:00.000  ...   2000
1  2021-01-01 00:01:00.000  ...   2000
2  2021-01-01 00:02:00.000  ...   2000