Home > database >  How to expand and create the following dataset in Pandas
How to expand and create the following dataset in Pandas

Time:11-09

I have a dataset that looks like this:

enter image description here

df = pd.DataFrame({
    'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
    'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
    'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
    'Daily':[4890.846,5337.086,5559.445,5271.98],
})

I would like to expand this out to take the daily value and create a new dataset with daily dates to form something like this:

enter image description here

Thank you!!

CodePudding user response:

  1. Use pd.date_range to create a column of all dates between "weekstart" and "weekend"
  2. Use explode to split into individual rows.
df["days"] = df.apply(lambda x: pd.date_range(x["weekstart"], x["weekend"], freq="D"), axis=1)
df = df.explode("days")
output = df[["days", "Daily"]]

>>> output

        days     Daily
0 2018-01-01  4890.846
0 2018-01-02  4890.846
0 2018-01-03  4890.846
0 2018-01-04  4890.846
0 2018-01-05  4890.846
0 2018-01-06  4890.846
0 2018-01-07  4890.846
1 2018-01-08  5337.086
1 2018-01-09  5337.086
1 2018-01-10  5337.086
1 2018-01-11  5337.086
1 2018-01-12  5337.086
1 2018-01-13  5337.086
1 2018-01-14  5337.086
2 2018-01-15  5559.445
2 2018-01-16  5559.445
2 2018-01-17  5559.445
2 2018-01-18  5559.445
2 2018-01-19  5559.445
2 2018-01-20  5559.445
2 2018-01-21  5559.445
3 2018-01-22  5271.980
3 2018-01-23  5271.980
3 2018-01-24  5271.980
3 2018-01-25  5271.980
3 2018-01-26  5271.980
3 2018-01-27  5271.980
3 2018-01-28  5271.980

CodePudding user response:

It seems like an easy problem. Iterate the existing dataframe using df.iterrows() and then get the current date range then use this range to iterate a range of dates and use these dates to populate another data frame with the detail dates and amounts. I iteriate the dataframe because I am expanding the data into a new dataframe rather than added a new column using a dataframe explode

df  = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})

#for key, value in df.iterrows():
#   fill a range with weekstart and weekend date ranges:
#   iterate the range and populate the new data frame 

df2=pd.DataFrame(columns=['Date','Amount'])
df2['Date']=pd.to_datetime(df2['Date'])

for key,value in df.iterrows():
    amount=value['Daily']
    for date in pd.date_range(value['weekstart'], value['weekend']):
        df2=df2.append({"Date":date, "Amount":amount},ignore_index=True)

index=range(1,len(df2) 1) 
df2.set_index(pd.Index(index),'index',inplace=True)

plt.plot(df2['Date'],df2['Amount'])
plt.xticks(rotation=90)
plt.show()

print(df2)

output:

Date    Amount
1  2018-01-01  4890.846
2  2018-01-02  4890.846
3  2018-01-03  4890.846
4  2018-01-04  4890.846
5  2018-01-05  4890.846
6  2018-01-06  4890.846
7  2018-01-07  4890.846
8  2018-01-08  5337.086
9  2018-01-09  5337.086
10 2018-01-10  5337.086
11 2018-01-11  5337.086
12 2018-01-12  5337.086
13 2018-01-13  5337.086
14 2018-01-14  5337.086
15 2018-01-15  5559.445
16 2018-01-16  5559.445
17 2018-01-17  5559.445
18 2018-01-18  5559.445
19 2018-01-19  5559.445
20 2018-01-20  5559.445
21 2018-01-21  5559.445
22 2018-01-22  5271.980
23 2018-01-23  5271.980
24 2018-01-24  5271.980
25 2018-01-25  5271.980
26 2018-01-26  5271.980
27 2018-01-27  5271.980
28 2018-01-28  5271.980
  • Related