Home > Net >  Reshape Pandas DataFrame with TimeSeries in rows instead of columns
Reshape Pandas DataFrame with TimeSeries in rows instead of columns

Time:05-09

I have a DataFrame df that contains price data (Open, Close, High, Low) for every day in the time from January 2010 to December 2021:

Name ISIN Data 02.01.2010 05.01.2010 06.01.2010 ... 31.12.2021
Apple US9835635986 Price Open 12.45 13.45 12.48 ... 54.12
Apple US9835635986 Price Close 12.58 15.35 12.38 ... 54.43
Apple US9835635986 Price High 12.78 15.85 12.83 ... 54.91
Apple US9835635986 Price Low 12.18 13.35 12.21 ... 53.98
Microsoft US1223928384 Price Open 12.45 13.45 12.48 ... 43.56
... .. ... ... ... ... ... ...

I am trying to reshape the table into the format below:

Date Name ISIN Price Open Price Close Price High Price Low
02.01.2010 Apple US9835635986 12.45 12.58 12.78 12.18
05.01.2010 Apple US9835635986 13.45 15.35 15.85 13.35
... ... ... ... ... ... ... ...
02.01.2010 Microsoft US1223928384 12.45 13.67 13.74 12.35

Simply transposing the DateFrame did not work. I also tried pivot which gave the error message that the operands ould not be broadcasted to different shapes.

dates = ['NAME','ISIN']
dates.append(df.columns.tolist()[3:]) # appends all columns names starting with 02.01.2010
df.pivot(index = dates, columns = 'Data', Values = 'Data')

How can I get this DataFrame in the desired format?

CodePudding user response:

Use DataFrame.melt before pivoting with convert datetimes, last sorting MultiIndex:

df = (df.melt(['Name','ISIN','Data'], var_name='Date')
        .assign(Date = lambda x: pd.to_datetime(x['Date'], format='%d.%m.%Y'))
        .pivot(index = ['Date','Name','ISIN'], columns = 'Data', values = 'value')
        .sort_index(level=[1,2,0])
        .reset_index()
        )
print (df)
Data       Date       Name          ISIN  Price Close  Price High  Price Low  \
0    2010-01-02      Apple  US9835635986        12.58       12.78      12.18   
1    2010-01-05      Apple  US9835635986        15.35       15.85      13.35   
2    2010-01-06      Apple  US9835635986        12.38       12.83      12.21   
3    2021-12-31      Apple  US9835635986        54.43       54.91      53.98   
4    2010-01-02  Microsoft  US1223928384          NaN         NaN        NaN   
5    2010-01-05  Microsoft  US1223928384          NaN         NaN        NaN   
6    2010-01-06  Microsoft  US1223928384          NaN         NaN        NaN   
7    2021-12-31  Microsoft  US1223928384          NaN         NaN        NaN   

Data  Price Open  
0          12.45  
1          13.45  
2          12.48  
3          54.12  
4          12.45  
5          13.45  
6          12.48  
7          43.56  

Another idea is first convert columns names for datetimes and then reshape by DataFrame.stack and Series.unstack:

L = df.columns.tolist()
df = (df.set_axis(L[:3]   pd.to_datetime(L[3:], format='%d.%m.%Y').tolist(), axis=1)
         .rename_axis('Date', axis=1)
         .set_index(L[:3])
         .stack()
         .unstack(2)
         .reorder_levels([2,0,1])
         .reset_index())
print (df)
Data       Date       Name          ISIN  Price Close  Price High  Price Low  \
0    2010-01-02      Apple  US9835635986        12.58       12.78      12.18   
1    2010-01-05      Apple  US9835635986        15.35       15.85      13.35   
2    2010-01-06      Apple  US9835635986        12.38       12.83      12.21   
3    2021-12-31      Apple  US9835635986        54.43       54.91      53.98   
4    2010-01-02  Microsoft  US1223928384          NaN         NaN        NaN   
5    2010-01-05  Microsoft  US1223928384          NaN         NaN        NaN   
6    2010-01-06  Microsoft  US1223928384          NaN         NaN        NaN   
7    2021-12-31  Microsoft  US1223928384          NaN         NaN        NaN   

Data  Price Open  
0          12.45  
1          13.45  
2          12.48  
3          54.12  
4          12.45  
5          13.45  
6          12.48  
7          43.56  
  • Related