I have the following dataframe:
from numpy import nan
df = pd.DataFrame({'Date': ['2014-09-30', '2014-10-01',
'2014-10-31', '2014-11-01'],
'X1': [20, nan, 19, nan],
'X2': [nan,2,nan,4],
'X3': [5,nan,9,nan],
})
Date X1 X2 X3
0 2014-09-30 20 nan 5
1 2014-10-01 nan 2 nan
2 2014-10-31 19 nan 9
3 2014-11-01 nan 4 nan
As you can see, the issue is that the columns have different release time. I want to create one unique dataframe with end-of-the-month day for each variable. The outcome should therefore be:
Date X1 X2 X3
0 2014-09-30 20 2 5
1 2014-10-31 19 4 9
Can anyone help me get it?
Thanks!
CodePudding user response:
You can use pandas pandas.tseries.offsets.MonthEnd
to groupby
first
from pandas.tseries.offsets import MonthEnd
g = pd.to_datetime(df['Date']) MonthEnd(1) - MonthEnd()
df.groupby(g, as_index=False).first().convert_dtypes()
output:
Date X1 X2 X3
0 2014-09-30 20 2 5
1 2014-10-31 19 4 9
CodePudding user response:
You can also fill all nan values with bfill
method and leave only last days of the month by using dt.is_month_end
function:
(df.fillna(method='bfill')
[pd.to_datetime(df['Date']).dt.is_month_end]
.reset_index(drop=True)
)
Output:
Date X1 X2 X3
0 2014-09-30 20 2 5
1 2014-10-31 19 4 9