I have a dataset that looks like this:
df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
I would like to expand this out to take the daily value and create a new dataset with daily dates to form something like this:
Thank you!!
CodePudding user response:
- Use
pd.date_range
to create a column of all dates between "weekstart" and "weekend" - Use
explode
to split into individual rows.
df["days"] = df.apply(lambda x: pd.date_range(x["weekstart"], x["weekend"], freq="D"), axis=1)
df = df.explode("days")
output = df[["days", "Daily"]]
>>> output
days Daily
0 2018-01-01 4890.846
0 2018-01-02 4890.846
0 2018-01-03 4890.846
0 2018-01-04 4890.846
0 2018-01-05 4890.846
0 2018-01-06 4890.846
0 2018-01-07 4890.846
1 2018-01-08 5337.086
1 2018-01-09 5337.086
1 2018-01-10 5337.086
1 2018-01-11 5337.086
1 2018-01-12 5337.086
1 2018-01-13 5337.086
1 2018-01-14 5337.086
2 2018-01-15 5559.445
2 2018-01-16 5559.445
2 2018-01-17 5559.445
2 2018-01-18 5559.445
2 2018-01-19 5559.445
2 2018-01-20 5559.445
2 2018-01-21 5559.445
3 2018-01-22 5271.980
3 2018-01-23 5271.980
3 2018-01-24 5271.980
3 2018-01-25 5271.980
3 2018-01-26 5271.980
3 2018-01-27 5271.980
3 2018-01-28 5271.980
CodePudding user response:
It seems like an easy problem. Iterate the existing dataframe using df.iterrows() and then get the current date range then use this range to iterate a range of dates and use these dates to populate another data frame with the detail dates and amounts. I iteriate the dataframe because I am expanding the data into a new dataframe rather than added a new column using a dataframe explode
df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
#for key, value in df.iterrows():
# fill a range with weekstart and weekend date ranges:
# iterate the range and populate the new data frame
df2=pd.DataFrame(columns=['Date','Amount'])
df2['Date']=pd.to_datetime(df2['Date'])
for key,value in df.iterrows():
amount=value['Daily']
for date in pd.date_range(value['weekstart'], value['weekend']):
df2=df2.append({"Date":date, "Amount":amount},ignore_index=True)
index=range(1,len(df2) 1)
df2.set_index(pd.Index(index),'index',inplace=True)
plt.plot(df2['Date'],df2['Amount'])
plt.xticks(rotation=90)
plt.show()
print(df2)
output:
Date Amount
1 2018-01-01 4890.846
2 2018-01-02 4890.846
3 2018-01-03 4890.846
4 2018-01-04 4890.846
5 2018-01-05 4890.846
6 2018-01-06 4890.846
7 2018-01-07 4890.846
8 2018-01-08 5337.086
9 2018-01-09 5337.086
10 2018-01-10 5337.086
11 2018-01-11 5337.086
12 2018-01-12 5337.086
13 2018-01-13 5337.086
14 2018-01-14 5337.086
15 2018-01-15 5559.445
16 2018-01-16 5559.445
17 2018-01-17 5559.445
18 2018-01-18 5559.445
19 2018-01-19 5559.445
20 2018-01-20 5559.445
21 2018-01-21 5559.445
22 2018-01-22 5271.980
23 2018-01-23 5271.980
24 2018-01-24 5271.980
25 2018-01-25 5271.980
26 2018-01-26 5271.980
27 2018-01-27 5271.980
28 2018-01-28 5271.980