How to adjust this dataframe?-CodePudding

I have the following dataframe:

from numpy import nan
df = pd.DataFrame({'Date': ['2014-09-30', '2014-10-01',
                            '2014-10-31', '2014-11-01'],
                     'X1': [20, nan, 19, nan],
                     'X2': [nan,2,nan,4],
                     'X3': [5,nan,9,nan],
                     }) 

         Date   X1   X2   X3
0  2014-09-30   20  nan    5
1  2014-10-01  nan    2  nan
2  2014-10-31   19  nan    9
3  2014-11-01  nan    4  nan

As you can see, the issue is that the columns have different release time. I want to create one unique dataframe with end-of-the-month day for each variable. The outcome should therefore be:


         Date   X1   X2   X3
0  2014-09-30   20    2    5
1  2014-10-31   19    4    9

Can anyone help me get it?

Thanks!

CodePudding user response：

You can use pandas pandas.tseries.offsets.MonthEnd to groupby first

from pandas.tseries.offsets import MonthEnd
g = pd.to_datetime(df['Date'])   MonthEnd(1) - MonthEnd()
df.groupby(g, as_index=False).first().convert_dtypes()

output:

         Date  X1  X2  X3
0  2014-09-30  20   2   5
1  2014-10-31  19   4   9

CodePudding user response：

You can also fill all nan values with bfill method and leave only last days of the month by using dt.is_month_end function:

(df.fillna(method='bfill')
 [pd.to_datetime(df['Date']).dt.is_month_end]
   .reset_index(drop=True)
)

Output:

          Date   X1  X2  X3
0   2014-09-30   20   2   5
1   2014-10-31   19   4   9