Home > Enterprise >  Extrapolation and repetition of data between specific dates
Extrapolation and repetition of data between specific dates

Time:07-26

Hello dear community,

I cannot really figure out the following problem:

I have this data:

    start        end        nominal
1   8/23/2021   9/15/2021   10000
2   9/1/2021    9/15/2021   100500
3   9/2/2021    9/15/2021   30000
4   9/3/2021    9/15/2021   2200
5   9/6/2021    9/15/2021   5700
6   9/7/2021    9/15/2021   6050

I want to transform it into:

date                  1       2       3      4      5      6
2021-08-23 00:00:00 1000                    
2021-08-24 00:00:00 1000                    
2021-08-25 00:00:00 1000                    
2021-08-26 00:00:00 1000                    
2021-08-27 00:00:00 1000                    
2021-08-28 00:00:00 1000                    
2021-08-29 00:00:00 1000                    
2021-08-30 00:00:00 1000                    
2021-08-31 00:00:00 1000                    
2021-09-01 00:00:00 1000    10050               
2021-09-02 00:00:00 1000    10050   3000            
2021-09-03 00:00:00 1000    10050   3000    2200        
2021-09-04 00:00:00 1000    10050   3000    2200        
2021-09-05 00:00:00 1000    10050   3000    2200        
2021-09-06 00:00:00 1000    10050   3000    2200    5700    
2021-09-07 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-08 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-09 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-10 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-11 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-12 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-13 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-14 00:00:00 1000    10050   3000    2200    5700    6050
2021-09-15 00:00:00 1000    10050   3000    2200    5700    6050

So that I generate the data range starting from the earliest date of the column "start" and finishing it with the very last date of the column "end". The dates inbetween I generated:

end = df.iloc[-1, 1]
start= df.iloc[0, 0]
numdays = (end datetime.timedelta(days=1)-date).days
date_list = [end - datetime.timedelta(days=x) for x in range(numdays)]

I have created a new df with nan columns:

df2= pd.DataFrame({"date":date_list })
for i in range(0, len(df.axes[0])): 
            df2.at[index, f"{i}"] = nan

but I cannot wrap my mind around mapping values from nominal to generated dates between start and end. I have tried using iterrows, map and even pivoting but have not gotten far in my attempts.

I would be very grateful if you could help me, the rookie

CodePudding user response:

You can create a date_range, explode it and pivot:

df['date'] = [pd.date_range(a, b) for a,b in zip(df.pop('start'), df.pop('end'))]

out = (df
 .explode('date')
 .reset_index()
 .pivot('date', 'index', 'nominal')
 .reset_index().rename_axis(columns=None)
 )

Output:

         date        1         2        3       4       5       6
0  2021-08-23  10000.0       NaN      NaN     NaN     NaN     NaN
1  2021-08-24  10000.0       NaN      NaN     NaN     NaN     NaN
2  2021-08-25  10000.0       NaN      NaN     NaN     NaN     NaN
3  2021-08-26  10000.0       NaN      NaN     NaN     NaN     NaN
4  2021-08-27  10000.0       NaN      NaN     NaN     NaN     NaN
5  2021-08-28  10000.0       NaN      NaN     NaN     NaN     NaN
6  2021-08-29  10000.0       NaN      NaN     NaN     NaN     NaN
7  2021-08-30  10000.0       NaN      NaN     NaN     NaN     NaN
8  2021-08-31  10000.0       NaN      NaN     NaN     NaN     NaN
9  2021-09-01  10000.0  100500.0      NaN     NaN     NaN     NaN
10 2021-09-02  10000.0  100500.0  30000.0     NaN     NaN     NaN
11 2021-09-03  10000.0  100500.0  30000.0  2200.0     NaN     NaN
12 2021-09-04  10000.0  100500.0  30000.0  2200.0     NaN     NaN
13 2021-09-05  10000.0  100500.0  30000.0  2200.0     NaN     NaN
14 2021-09-06  10000.0  100500.0  30000.0  2200.0  5700.0     NaN
15 2021-09-07  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
16 2021-09-08  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
17 2021-09-09  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
18 2021-09-10  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
19 2021-09-11  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
20 2021-09-12  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
21 2021-09-13  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
22 2021-09-14  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
23 2021-09-15  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0

CodePudding user response:

You can create the range of dates for each record then explode it and finally reshape your dataframe:

out = (df.assign(date=df.apply(lambda x: pd.date_range(x['start'], x['end']), axis=1))
         .explode('date').reset_index().pivot('date', 'index', 'nominal')
         .rename_axis(columns=None).reset_index())
print(out)

# Output
         date        1         2        3       4       5       6
0  2021-08-23  10000.0       NaN      NaN     NaN     NaN     NaN
1  2021-08-24  10000.0       NaN      NaN     NaN     NaN     NaN
2  2021-08-25  10000.0       NaN      NaN     NaN     NaN     NaN
3  2021-08-26  10000.0       NaN      NaN     NaN     NaN     NaN
4  2021-08-27  10000.0       NaN      NaN     NaN     NaN     NaN
5  2021-08-28  10000.0       NaN      NaN     NaN     NaN     NaN
6  2021-08-29  10000.0       NaN      NaN     NaN     NaN     NaN
7  2021-08-30  10000.0       NaN      NaN     NaN     NaN     NaN
8  2021-08-31  10000.0       NaN      NaN     NaN     NaN     NaN
9  2021-09-01  10000.0  100500.0      NaN     NaN     NaN     NaN
10 2021-09-02  10000.0  100500.0  30000.0     NaN     NaN     NaN
11 2021-09-03  10000.0  100500.0  30000.0  2200.0     NaN     NaN
12 2021-09-04  10000.0  100500.0  30000.0  2200.0     NaN     NaN
13 2021-09-05  10000.0  100500.0  30000.0  2200.0     NaN     NaN
14 2021-09-06  10000.0  100500.0  30000.0  2200.0  5700.0     NaN
15 2021-09-07  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
16 2021-09-08  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
17 2021-09-09  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
18 2021-09-10  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
19 2021-09-11  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
20 2021-09-12  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
21 2021-09-13  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
22 2021-09-14  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
23 2021-09-15  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
  • Related