I have a DataFrame df that contains price data (Open, Close, High, Low) for every day in the time from January 2010 to December 2021:
Name | ISIN | Data | 02.01.2010 | 05.01.2010 | 06.01.2010 | ... | 31.12.2021 |
---|---|---|---|---|---|---|---|
Apple | US9835635986 | Price Open | 12.45 | 13.45 | 12.48 | ... | 54.12 |
Apple | US9835635986 | Price Close | 12.58 | 15.35 | 12.38 | ... | 54.43 |
Apple | US9835635986 | Price High | 12.78 | 15.85 | 12.83 | ... | 54.91 |
Apple | US9835635986 | Price Low | 12.18 | 13.35 | 12.21 | ... | 53.98 |
Microsoft | US1223928384 | Price Open | 12.45 | 13.45 | 12.48 | ... | 43.56 |
... | .. | ... | ... | ... | ... | ... | ... |
I am trying to reshape the table into the format below:
Date | Name | ISIN | Price Open | Price Close | Price High | Price Low | |
---|---|---|---|---|---|---|---|
02.01.2010 | Apple | US9835635986 | 12.45 | 12.58 | 12.78 | 12.18 | |
05.01.2010 | Apple | US9835635986 | 13.45 | 15.35 | 15.85 | 13.35 | |
... | ... | ... | ... | ... | ... | ... | ... |
02.01.2010 | Microsoft | US1223928384 | 12.45 | 13.67 | 13.74 | 12.35 |
Simply transposing the DateFrame did not work. I also tried pivot which gave the error message that the operands ould not be broadcasted to different shapes.
dates = ['NAME','ISIN']
dates.append(df.columns.tolist()[3:]) # appends all columns names starting with 02.01.2010
df.pivot(index = dates, columns = 'Data', Values = 'Data')
How can I get this DataFrame in the desired format?
CodePudding user response:
Use DataFrame.melt
before pivoting with convert datetimes, last sorting MultiIndex
:
df = (df.melt(['Name','ISIN','Data'], var_name='Date')
.assign(Date = lambda x: pd.to_datetime(x['Date'], format='%d.%m.%Y'))
.pivot(index = ['Date','Name','ISIN'], columns = 'Data', values = 'value')
.sort_index(level=[1,2,0])
.reset_index()
)
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56
Another idea is first convert columns names for datetimes and then reshape by DataFrame.stack
and Series.unstack
:
L = df.columns.tolist()
df = (df.set_axis(L[:3] pd.to_datetime(L[3:], format='%d.%m.%Y').tolist(), axis=1)
.rename_axis('Date', axis=1)
.set_index(L[:3])
.stack()
.unstack(2)
.reorder_levels([2,0,1])
.reset_index())
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56