I have a DataFrame that is a time series every minutes (duration: up to today) All ts are in datetime64[ns]
Input:
timestamp A
2021-01-01 00:00:00.000 2000
2021-01-01 00:01:00.000 2005
2021-01-01 00:02:00.000 2010
...
I create a new column with categorical data consisting of periods of 40 days like below: ( I am using pd.cut) , output is:
timestamp A period
2021-01-01 00:00:00.000 2000 (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:01:00.000 2005 (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:02:00.000 2010 (2021-12-31 23:59:59.999999999, 2022-02-10]
...
The set of period is as below:
[(2021-12-31 23:59:59.999999999, 2022-02-10] < (2022-02-10, 2022-03-22] < (2022-03-22, 2022-05-01]]
I want to create a new column in which I get the first value of this category For example:
timestamp A period A_ref
2021-01-01 00:00:00.000 2000 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000
2021-01-01 00:01:00.000 2005 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000
2021-01-01 00:02:00.000 2010 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000
...
It could be that the beginning of the period is not an exact match.
What I have tried: df.resample('40D') taking 'first' as aggregator. Doing this I can get the value I am interested in but I really want is fill this value across all of the rows. Maybe the approach of using a category is not necessary?
Thank you
CodePudding user response:
df.merge(df.groupby('period')['A'].first().reset_index(), on='period', suffixes = ('', '_ref'))
timestamp ... A_ref
0 2021-01-01 00:00:00.000 ... 2000
1 2021-01-01 00:01:00.000 ... 2000
2 2021-01-01 00:02:00.000 ... 2000