I have a pandas dataframe that consists of a date column of one year and then daily data to go along with it. I want to update the year of just the rows that pertain to January. I can select the January subset of my dataframe just fine, and try to change the year of that subset based on the answer given here, but when I try to update the values of that subset by adding an offset I get an error.
Setup:
import pandas as pd
df = pd.DataFrame({'Date': pd.date_range(start = "01-01-2023", end = "12-31-2023"), 'data': 25})
Select January subset:
df[df['Date'].dt.month == 1]
This works as expected:
Date data
0 2023-01-01 25
1 2023-01-02 25
2 2023-01-03 25
3 2023-01-04 25
4 2023-01-05 25
5 2023-01-06 25
6 2023-01-07 25
7 2023-01-08 25
8 2023-01-09 25
9 2023-01-10 25
10 2023-01-11 25
11 2023-01-12 25
12 2023-01-13 25
13 2023-01-14 25
14 2023-01-15 25
15 2023-01-16 25
16 2023-01-17 25
17 2023-01-18 25
18 2023-01-19 25
19 2023-01-20 25
20 2023-01-21 25
21 2023-01-22 25
22 2023-01-23 25
23 2023-01-24 25
24 2023-01-25 25
25 2023-01-26 25
26 2023-01-27 25
27 2023-01-28 25
28 2023-01-29 25
29 2023-01-30 25
30 2023-01-31 25
Attempt to change:
df[df['Date'].dt.month == 1] = df[df['Date'].dt.month == 1] pd.offsets.DateOffset(years=1)
TypeError: Concatenation operation is not implemented for NumPy arrays, use np.concatenate() instead. Please do not rely on this error; it may not be given on all Python implementations.
I've tried a few different variations of this but seem to be having issues changing the subset dataframe data.
CodePudding user response:
You have to select Date
column (solution enhanced by @mozway, thanks):
df.loc[df['Date'].dt.month == 1, 'Date'] = pd.offsets.DateOffset(years=1)
print(df)
# Output
Date data
0 2024-01-01 25
1 2024-01-02 25
2 2024-01-03 25
3 2024-01-04 25
4 2024-01-05 25
.. ... ...
360 2023-12-27 25
361 2023-12-28 25
362 2023-12-29 25
363 2023-12-30 25
364 2023-12-31 25
[365 rows x 2 columns]