How to resample to a coarser resolution but to samples within the original index?-CodePudding

I have the following use case:

import pandas as pd
import numpy as np

# create dataframe
df = pd.DataFrame(data=np.random.rand(10, 3),
                  columns=['a', 'b'],
                  index=pd.date_range('2021-01-01', periods=10, freq='W-FRI'))
# data is random, I'm just saving time with copy paste first row
df
>               a          b
> 2021-01-01    0.272628   0.974373
> 2021-01-08    0.272628   0.974373
> 2021-01-15    0.272628   0.974373
> 2021-01-22    0.272628   0.974373
> 2021-01-29    0.272628   0.974373
> 2021-02-05    0.759018   0.443803
> 2021-02-12    0.759018   0.443803
> 2021-02-19    0.759018   0.443803
> 2021-02-26    0.759018   0.443803
> 2021-03-05    0.973900   0.929002

I would like to get the first matching sample within my index when I resample but doing the following doesn't work, note that the dates aren't in my original index:

df.resample('M').first()
>               a          b
> 2021-01-31    0.272628   0.160300
> 2021-02-28    0.759018   0.443803
> 2021-03-31    0.973900   0.929002

I'd like to resample to monthly but taking the first matching date sample each time, i.e., I would like the following result:

>               a          b
> 2021-01-01    0.272628   0.160300
> 2021-02-05    0.759018   0.443803
> 2021-03-05    0.973900   0.929002

I could do a hack as follows but this is not ideal, it'd only works for this toy example:

df.loc[list(np.diff(df.index.month.values, prepend=0) == 1)]

CodePudding user response：

One way is to transform the index to period, then drop the duplicates:

months = df.index.to_series().dt.to_period('M')
df[~month.duplicated()]

Another, might actually be better, is groupby().head()

df.groupby(pd.Grouper(freq='M')).head(1)

Output:

                   a         b
2021-01-01  0.695784  0.228550
2021-02-05  0.188707  0.278871
2021-03-05  0.935635  0.785341