I have a Python dataframe (8000 rows) with a datetime format column, which has dates like YYYY-MM-DD. I am looking to change it from being a single date to multiple months, and years, with same day.
My Output:
0 data-1 2011-12-03
1 data-2 2011-12-03
2 data-3 2011-12-03
..
..
data-4 2011-12-03
data-5 2011-12-03
7999 data-6 2011-12-03
Expected output:
val1 date
0 data-1 2009-01-03
.. 2009-02-03
2009-03-03
..
..
2009-11-03
2009-12-03
11 data-n 2010-01-03
.. 2010-02-03
2010-03-03
..
..
2010-11-03
2010-12-03
2011-01-03
.. 2011-02-03
2011-03-03
..
..
2011-11-03
7999 data-m 2011-12-03
I want it to spread over the 12 months and 5 years. I tried:
df.date[0:999] = pd.to_datetime(stallion_df.date[0:999]) pd.offsets.DateOffset(years=1)
df.date[0:999] = pd.to_datetime(stallion_df.date[0:999]) pd.offsets.DateOffset(months=3)
...
for 8000 rows for year and month, which is clearly not optimal. Any help would be much appreciated. Thanks
CodePudding user response:
Is this your expected output? It's constructing the Cartesian product of dates ranging from '2018-01-03'
to '2022-12-03'
and val
column. You have date-1
to date-m
, so I substituted m
with 100. Then you'll get 6000 rows.
m = 100
out = (pd.MultiIndex.from_product([[f'date-{i}' for i in range(1,m 1)],
pd.date_range('2018-01-01','2022-12-01', freq='MS') pd.DateOffset(days=2)])
.to_frame(name=['val','date']).reset_index(drop=True))
Output:
val date
0 date-1 2018-01-03
1 date-1 2018-02-03
2 date-1 2018-03-03
3 date-1 2018-04-03
4 date-1 2018-05-03
... ... ...
5995 date-100 2022-08-03
5996 date-100 2022-09-03
5997 date-100 2022-10-03
5998 date-100 2022-11-03
5999 date-100 2022-12-03
[6000 rows x 2 columns]
CodePudding user response:
You may basically use date_range to create a series to be added:
dates = pd.Series(pd.date_range(start='1/3/2009', end='3/11/2011', freq='M'))
Just replace it with your desired start and end date. It will automatically increase months and years by keeping the last day of each month. If you want to set the day to a specific one:
dates.apply(lambda x: x.replace(day=3))
This returns the same but with day 3 for all entries. If you also want a larger series with repeated days, you may use repeat as:
dates.repeat(10).reset_index(drop=True)
So this way you will have the same series but each date is repeated 10 times.