I've been staring at this way too long and I think Ive lost my mind, it really shouldn't be as complicated as I'm making it.
I have a df:
Date1 | Date2 |
---|---|
2022-04-01 | 2022-06-17 |
2022-04-15 | 2022-04-15 |
2022-03-03 | NaT |
2022-04-22 | NaT |
2022-05-06 | 2022-06-06 |
I want to fill the blanks in 'Date2' where it keeps the values from 'Date2' if they are present but if 'Date2' is NaT then I want it to be the last date of the subsequent month from 'Date1'.
In the example above, the 2 NaT fields would become:
Date1 | Date2 |
---|---|
2022-03-03 | 2022-04-30 |
2022-04-22 | 2022-05-31 |
I know I have to use .fillna and the closest I've come is this:
df['Date2'] = (df['Date2'].fillna((df['Date1'] pd.DateOffset(months=1)).replace)).to_numpy().astype('datetime64[M]')
This returns the first of the month. However, it returns the first of the month for all rows (not just NaT rows) and it is returning the first of the month as opposed to the last of the month.
I'm pretty sure my parenthesis are messed up and I've tried many different combinations of - timedelta and similar.
What am I doing wrong here? TIA!
CodePudding user response:
Your question can be interpreted in two ways given the provided example.
End of month of the next row's date 1 (which now does not seem to be what you want)
You need to use pd.offses.MonthEnd
and shift
df['Date2'] = (df['Date2']
.fillna(df['Date1'].add(pd.offsets.MonthEnd())
.shift(-1))
)
Next month's end (same row)
If you want the next month end of the same row:
df['Date2'] = (df['Date2']
.fillna(df['Date1'].add(pd.offsets.MonthEnd(2)))
)
Output:
Date1 Date2
0 2022-04-01 2022-06-17
1 2022-04-15 2022-04-15
2 2022-03-03 2022-04-30
3 2022-04-22 2022-05-31
4 2022-05-06 2022-06-06
CodePudding user response:
Use MonthEnd
and loc
:
from pandas.tseries.offsets import MonthEnd
>>> df.loc[df['Date2'].isnull(), 'Date2'] = df['Date1'] pd.DateOffset(months=1) MonthEnd(1)
CodePudding user response:
Use MonthEnd
with an offset of 2 (current month and next month):
df['Date2'] = df['Date2'].fillna(df['Date1'].add(pd.offsets.MonthEnd(2)))
print(df)
# Output
Date1 Date2
0 2022-04-01 2022-06-17
1 2022-04-15 2022-04-15
2 2022-03-03 2022-04-30
3 2022-04-22 2022-05-31
4 2022-05-06 2022-06-06