I try to figure out an efficient way to create a DataFrame of returns (ReturnTable) based on prices (PriceTable). Is there a more efficient way than just iterating with a for loop over the columns?
Here I have a small example:
import pandas as pd
PriceTable = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Stock1':[10, 12, 13.5, 11.5],
'Stock2':[5, 5.5, 'NaN', 'NaN'],
'Stock3':['NaN', 9, 9.5, 10.5],
'Stock4':[20, 20, 19.5, 15]})
I try to get a ReturnTable which is just the division of the price at time t and t-1.
ReturnTable = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Stock1':[0, 1.2, 1.125, 0.852],
'Stock2':[0, 1.1, 'NaN', 'NaN'],
'Stock3':['NaN', 0, 1.056, 1.105],
'Stock4':[0, 1, 0.975, 0.769]})
Thanks a lot!
CodePudding user response:
You could use shift
and a division:
NB. 'NaN'
are strings, so you first need to convert to float('nan')
PriceTable = PriceTable.replace('NaN', float('nan'))
cols = PriceTable.select_dtypes('number').columns
ReturnTable = PriceTable.copy()
ReturnTable[cols] = PriceTable[cols]/PriceTable[cols].shift()
output:
Dates Stock1 Stock2 Stock3 Stock4
0 2021-01-01 NaN NaN NaN NaN
1 2021-01-02 1.2000 1.1000 NaN 1.0000
2 2021-01-03 1.1250 NaN 1.0556 0.9750
3 2021-01-04 0.8519 NaN 1.1053 0.7692