I have a large dataframe (resultsDF
) set up with a sort-of three column index: [Ticker, Fiscal Year, Fiscal Period]
like so
Ticker Fiscal Year Fiscal Period Market Returns ROI
21 AA 2017 Q3 0.028904 0.006549
23 AA 2019 Q2 0.038022 -0.026191
24 AA 2018 Q3 0.024746 -0.002533
25 AA 2018 Q4 -0.058857 0.004141
26 AA 2019 Q3 0.010057 -0.015065
... ... ... ... ... ...
36582 ZTS 2017 Q1 0.056305 0.031115
36583 ZTS 2019 Q1 0.014543 0.028669
36584 ZTS 2018 Q4 -0.058857 0.032013
36585 ZTS 2019 Q4 0.033800 0.033261
36586 ZTS 2020 Q1 -0.041786 0.036693
Part one:
I'm creating a column Beat Market (Overall)
to track if the average ROI for the ticker is greater than the average return of the market. Since it tracks the averages, that new column will be either one or zero for the whole stock. So every value in Beat Market (Overall)
for AA
might be 1, where as every value in Beat Market (Overall)
for ZTS
might be 0. The complicated thing about it is that those ROI
's would need to be summed for each ticker, possibly using the groupby
function, but I'm not sure how to use that in this context.
Part two: I have a lookup dataframe, companies
, that holds industry id information for each ticker. I want to match up the Sector
column in the final dataframe to the correct industry id for each ticker without using a loop.
The problem is that running this code for this dataset takes a way too long, and if I could vectorize it, it would be much faster.
Here's the loop I'd like to vectorize:
# find if AVG returns for each ticker is greater than AVG returns for market
avg_ror = resultsDF["Market Returns"].unique().mean()
for index, row in resultsDF.iterrows():
roi = resultsDF[resultsDF["Ticker"] == row["Ticker"]]["ROI"].mean()
resultsDF.loc[index, "Beat Market (Overall)"] = 1 if roi > avg_ror else 0
try:
resultsDF.loc[index, "Sector"] = companies.loc[row["Ticker"]][
"IndustryId"
] # ["Sector"]
except:
pass
The final dataframe (less some other columns) would look like this:
Ticker Fiscal Year Fiscal Period Sector Market Returns ROI Beat Market (Overall)
21 AA 2017 Q3 110004 0.028904 0.006549 0.0
23 AA 2019 Q2 110004 0.038022 -0.026191 0.0
24 AA 2018 Q3 110004 0.024746 -0.002533 0.0
25 AA 2018 Q4 110004 -0.058857 0.004141 0.0
26 AA 2019 Q3 110004 0.010057 -0.015065 0.0
... ... ... ... ... ... ... ...
36582 ZTS 2017 Q1 106005.0 0.056305 0.031115 1.0
36583 ZTS 2019 Q1 106005.0 0.014543 0.028669 1.0
36584 ZTS 2018 Q4 106005.0 -0.058857 0.032013 1.0
36585 ZTS 2019 Q4 106005.0 0.033800 0.033261 1.0
36586 ZTS 2020 Q1 106005.0 -0.041786 0.036693 1.0
CodePudding user response:
(1) You can groupby
on Ticker
find group mean, transform
it and compare it to market returns mean for every item in the dataframe
(2) You can use .map
method to map to Sector
in companies
dataframe (here I assumed AA
is in sector A
and ZTS
is in B
)
resultsDF['Beat Market (Overall)'] = (resultsDF.groupby('Ticker')['ROI'].transform(np.mean) > resultsDF['Market Returns'].unique().mean()).astype(int)
resultsDF['Sector'] = resultsDF['Ticker'].map(companies.set_index('Ticker')['IndustryId'])
Output:
Ticker Fiscal Year Fiscal Period Market Returns ROI Beat Market Sector
21 AA 2017 Q3 0.028904 0.006549 0 A
23 AA 2019 Q2 0.038022 -0.026191 0 A
24 AA 2018 Q3 0.024746 -0.002533 0 A
25 AA 2018 Q4 -0.058857 0.004141 0 A
26 AA 2019 Q3 0.010057 -0.015065 0 A
36582 ZTS 2017 Q1 0.056305 0.031115 1 B
36583 ZTS 2019 Q1 0.014543 0.028669 1 B
36584 ZTS 2018 Q4 -0.058857 0.032013 1 B
36585 ZTS 2019 Q4 0.033800 0.033261 1 B
36586 ZTS 2020 Q1 -0.041786 0.036693 1 B