Home > other >  Pandas dataframe to multiple adjacency matricies
Pandas dataframe to multiple adjacency matricies

Time:12-08

I need to transform a data frame into what I think are adjacency matrices or some sort of pivot table using a datetime column. I have tried a lot of googling but haven't found anything, so any help in how to do this or even what I should be googling would be appreciated.

Here is a simplified version of my data:

df = pd.DataFrame({'Location' : [1]*7   [2]*7,
                   'Postcode' : ['XXX XXX']*7   ['YYY YYY']*7,
                   'Date' : ['03-12-2021', '04-12-2021', '05-12-2021', '06-12-2021', '07-12-2021',
                             '08-12-2021', '09-12-2021', '03-12-2021', '04-12-2021', '05-12-2021',
                             '06-12-2021', '07-12-2021', '08-12-2021', '09-12-2021'],
                   'Var 1' : [6.9, 10.2, 9.2, 7.6, 9.8, 8.6, 10.6, 9.9, 9.4, 9, 9.4, 9.1, 8, 9.9],
                   'Var 2' : [14.5, 6.2, 9.7, 12.7, 14.8, 12, 12.2, 12.3, 14.2, 13.8, 11.7, 17.8,
                              10.7, 12.3]})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)


    Location Postcode       Date  Var 1  Var 2
0          1  XXX XXX 2021-12-03    6.9   14.5
1          1  XXX XXX 2021-12-04   10.2    6.2
2          1  XXX XXX 2021-12-05    9.2    9.7
3          1  XXX XXX 2021-12-06    7.6   12.7
4          1  XXX XXX 2021-12-07    9.8   14.8
5          1  XXX XXX 2021-12-08    8.6   12.0
6          1  XXX XXX 2021-12-09   10.6   12.2
7          2  YYY YYY 2021-12-03    9.9   12.3
8          2  YYY YYY 2021-12-04    9.4   14.2
9          2  YYY YYY 2021-12-05    9.0   13.8
10         2  YYY YYY 2021-12-06    9.4   11.7
11         2  YYY YYY 2021-12-07    9.1   17.8
12         2  YYY YYY 2021-12-08    8.0   10.7
13         2  YYY YYY 2021-12-09    9.9   12.3

The output I want to create is what each variable will be in 1, 2, 3 etc days from the Date variable, so it would look like this:

enter image description here

But I have no idea how or where to start. My only thought is several for loops but in reality I have hundreds of locations and 10 variables for 14 Dates each, so it is a large dataset and this would be very inefficient. I feel like there should be a function or simpler way to achieve this.

CodePudding user response:

Create DatetimIndex and then use DataFrameGroupBy.shift withadd suffix by DataFrame.add_suffix with {i:02} for 01, 02..10, 11 for correct sorting columns names in last step:

df = df.set_index('Date')

for i in range(1,7):
    df = df.join(df.groupby('Location')[['Var 1', 'Var 2']].shift(freq=f'-{i}d')
                   .add_suffix(f'  Day {i:02}'), on=['Location','Date'])
    
df = df.set_index(['Location','Postcode'], append=True).sort_index(axis=1)
  • Related