I have a daily measured dataset with datetime index. I am trying to resample it to monthly by only taking the data of the first available day of month.
dataframe df:
2010-10-04 Nan 4
2010-10-05 3 5
2010-10-06 5 2
I tried using
df.resample("MS").first()
But that ends up giving me
2010-10-01 3 4
instead of
2010-10-04 Nan 4
Can I avoid droping Nan values? I couldnt find a suitable parameter in the documentation.
CodePudding user response:
IIUC, you need the first row for each month in your dataset. I tried elaborating on your example by adding more months and from different years.
Try grouping by the month
and take the head(1)
for each month, assuming the data is sorted by dates.
# A B
# 2010-10-04 NaN 4.0 <----
# 2010-10-05 3.0 5.0
# 2010-10-06 5.0 2.0
# 2010-09-05 NaN NaN <----
# 2010-09-05 3.0 5.0
# 2010-09-06 5.0 2.0
# 2019-10-04 7.0 7.0 <----
# 2019-10-05 3.0 5.0
# 2019-10-06 5.0 2.0
df.groupby(pd.Grouper(freq="M")).head(1)
A B
2010-09-05 NaN NaN
2010-10-04 NaN 4.0
2019-10-04 7.0 7.0