Hello dear community,
I cannot really figure out the following problem:
I have this data:
start end nominal
1 8/23/2021 9/15/2021 10000
2 9/1/2021 9/15/2021 100500
3 9/2/2021 9/15/2021 30000
4 9/3/2021 9/15/2021 2200
5 9/6/2021 9/15/2021 5700
6 9/7/2021 9/15/2021 6050
I want to transform it into:
date 1 2 3 4 5 6
2021-08-23 00:00:00 1000
2021-08-24 00:00:00 1000
2021-08-25 00:00:00 1000
2021-08-26 00:00:00 1000
2021-08-27 00:00:00 1000
2021-08-28 00:00:00 1000
2021-08-29 00:00:00 1000
2021-08-30 00:00:00 1000
2021-08-31 00:00:00 1000
2021-09-01 00:00:00 1000 10050
2021-09-02 00:00:00 1000 10050 3000
2021-09-03 00:00:00 1000 10050 3000 2200
2021-09-04 00:00:00 1000 10050 3000 2200
2021-09-05 00:00:00 1000 10050 3000 2200
2021-09-06 00:00:00 1000 10050 3000 2200 5700
2021-09-07 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-08 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-09 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-10 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-11 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-12 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-13 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-14 00:00:00 1000 10050 3000 2200 5700 6050
2021-09-15 00:00:00 1000 10050 3000 2200 5700 6050
So that I generate the data range starting from the earliest date of the column "start" and finishing it with the very last date of the column "end". The dates inbetween I generated:
end = df.iloc[-1, 1]
start= df.iloc[0, 0]
numdays = (end datetime.timedelta(days=1)-date).days
date_list = [end - datetime.timedelta(days=x) for x in range(numdays)]
I have created a new df with nan columns:
df2= pd.DataFrame({"date":date_list })
for i in range(0, len(df.axes[0])):
df2.at[index, f"{i}"] = nan
but I cannot wrap my mind around mapping values from nominal to generated dates between start and end. I have tried using iterrows, map and even pivoting but have not gotten far in my attempts.
I would be very grateful if you could help me, the rookie
CodePudding user response:
You can create a date_range
, explode
it and pivot
:
df['date'] = [pd.date_range(a, b) for a,b in zip(df.pop('start'), df.pop('end'))]
out = (df
.explode('date')
.reset_index()
.pivot('date', 'index', 'nominal')
.reset_index().rename_axis(columns=None)
)
Output:
date 1 2 3 4 5 6
0 2021-08-23 10000.0 NaN NaN NaN NaN NaN
1 2021-08-24 10000.0 NaN NaN NaN NaN NaN
2 2021-08-25 10000.0 NaN NaN NaN NaN NaN
3 2021-08-26 10000.0 NaN NaN NaN NaN NaN
4 2021-08-27 10000.0 NaN NaN NaN NaN NaN
5 2021-08-28 10000.0 NaN NaN NaN NaN NaN
6 2021-08-29 10000.0 NaN NaN NaN NaN NaN
7 2021-08-30 10000.0 NaN NaN NaN NaN NaN
8 2021-08-31 10000.0 NaN NaN NaN NaN NaN
9 2021-09-01 10000.0 100500.0 NaN NaN NaN NaN
10 2021-09-02 10000.0 100500.0 30000.0 NaN NaN NaN
11 2021-09-03 10000.0 100500.0 30000.0 2200.0 NaN NaN
12 2021-09-04 10000.0 100500.0 30000.0 2200.0 NaN NaN
13 2021-09-05 10000.0 100500.0 30000.0 2200.0 NaN NaN
14 2021-09-06 10000.0 100500.0 30000.0 2200.0 5700.0 NaN
15 2021-09-07 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
16 2021-09-08 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
17 2021-09-09 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
18 2021-09-10 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
19 2021-09-11 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
20 2021-09-12 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
21 2021-09-13 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
22 2021-09-14 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
23 2021-09-15 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
CodePudding user response:
You can create the range of dates for each record then explode it and finally reshape your dataframe:
out = (df.assign(date=df.apply(lambda x: pd.date_range(x['start'], x['end']), axis=1))
.explode('date').reset_index().pivot('date', 'index', 'nominal')
.rename_axis(columns=None).reset_index())
print(out)
# Output
date 1 2 3 4 5 6
0 2021-08-23 10000.0 NaN NaN NaN NaN NaN
1 2021-08-24 10000.0 NaN NaN NaN NaN NaN
2 2021-08-25 10000.0 NaN NaN NaN NaN NaN
3 2021-08-26 10000.0 NaN NaN NaN NaN NaN
4 2021-08-27 10000.0 NaN NaN NaN NaN NaN
5 2021-08-28 10000.0 NaN NaN NaN NaN NaN
6 2021-08-29 10000.0 NaN NaN NaN NaN NaN
7 2021-08-30 10000.0 NaN NaN NaN NaN NaN
8 2021-08-31 10000.0 NaN NaN NaN NaN NaN
9 2021-09-01 10000.0 100500.0 NaN NaN NaN NaN
10 2021-09-02 10000.0 100500.0 30000.0 NaN NaN NaN
11 2021-09-03 10000.0 100500.0 30000.0 2200.0 NaN NaN
12 2021-09-04 10000.0 100500.0 30000.0 2200.0 NaN NaN
13 2021-09-05 10000.0 100500.0 30000.0 2200.0 NaN NaN
14 2021-09-06 10000.0 100500.0 30000.0 2200.0 5700.0 NaN
15 2021-09-07 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
16 2021-09-08 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
17 2021-09-09 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
18 2021-09-10 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
19 2021-09-11 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
20 2021-09-12 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
21 2021-09-13 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
22 2021-09-14 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0
23 2021-09-15 10000.0 100500.0 30000.0 2200.0 5700.0 6050.0