I wrote a lapply-function in order to assign stock-prices around a certain date to specific companies. All my companies, for which I want to assign stock-prices are in the dataset "peers_per_deal_germany".
My first step is to identify based on the date and the company specific, ISIN, whether there is such a company in my Stock_Prices dataset. If "yes", I defined a certain timeframe for which I want to get the stock data. Then I further defined some restrictions such as not more than 40 NA's in the vector, as this would disturb my results.
The code works perfectly fine. However, for ~15'000 companies the code took around 1 hour to process. My full dataset contains around 1.8 Mio. companies for which I would need the stock prices.
Is there any way I can speed this lapply-function up? Thank you so much for your help.
get_return_vector_germany <- function(idx, peer_company, SIC, ISIN,
deal, announcement, peer_country) {
peer <- peer_company[idx]
SIC <- SIC[idx]
Deal_Nr <- deal[idx]
company_ticker <- ISIN[idx]
announcement_date <- announcement[idx]
peer_country <- peer_country[idx]
row <- c()
vector_stock_prices <- c()
vector_stock_return <- c()
vector_stock_prices_event <- c()
vector_stock_return_event <- c()
if (length(which(Stock_Prices_DE$datadate == announcement_date &
Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)) ==
0) {
row <- NA
} else {
row <- which(Stock_Prices_DE$datadate == announcement_date &
Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)
}
if (sum(is.na(row) == 1)) {
vector_stock_prices <- rep(NA, times = 179)
} else {
vector_stock_prices <- Stock_Prices_DE[(row - 218):(row - 39),
7]
}
if (sum(is.na(vector_stock_prices)) > 40) {
vector_stock_return <- list(rep(NA, times = 179))
} else {
vector_stock_return <- list(diff(vector_stock_prices)/
vector_stock_prices[-length(vector_stock_prices)])
}
if (sum(is.na(row) == 1)) {
vector_stock_prices_event <- rep(NA, times = 22)
} else {
vector_stock_prices_event <- Stock_Prices_DE[(row - 11):(row
10), 7]
}
if (sum(is.na(vector_stock_prices_event)) > 0) {
vector_stock_return_event <- list(rep(NA, times = 21))
} else {
vector_stock_return_event <- list(diff(vector_stock_prices_event)/
vector_stock_prices_event[-length(vector_stock_prices_event)])
}
vector <- data.frame(cbind(peer, Deal_Nr, SIC, peer_country, vector_stock_return,
vector_stock_return_event))
return(vector)
}
results_vector_germany <- lapply(1:nrow(peers_per_deal_germany), get_return_vector_germany, peers_per_deal_germany$peer_company, peers_per_deal_germany$current_SIC, peers_per_deal_germany$ISIN_code, peers_per_deal_germany$deal_nr, peers_per_deal_germany$current_announcement, peers_per_deal_germany$peer_country)
CodePudding user response:
try to do the task in parallel using mclapply
.