Home > database >  How to speed up a for loop using lapply?
How to speed up a for loop using lapply?

Time:12-14

I wrote a lapply-function in order to assign stock-prices around a certain date to specific companies. All my companies, for which I want to assign stock-prices are in the dataset "peers_per_deal_germany".

My first step is to identify based on the date and the company specific, ISIN, whether there is such a company in my Stock_Prices dataset. If "yes", I defined a certain timeframe for which I want to get the stock data. Then I further defined some restrictions such as not more than 40 NA's in the vector, as this would disturb my results.

The code works perfectly fine. However, for ~15'000 companies the code took around 1 hour to process. My full dataset contains around 1.8 Mio. companies for which I would need the stock prices.

Is there any way I can speed this lapply-function up? Thank you so much for your help.

get_return_vector_germany <- function(idx, peer_company, SIC, ISIN,
                                      deal, announcement, peer_country) {
  peer <- peer_company[idx]
  SIC <- SIC[idx]
  Deal_Nr <- deal[idx]
  company_ticker <- ISIN[idx]
  announcement_date <- announcement[idx]
  peer_country <- peer_country[idx]
  row <- c()
  vector_stock_prices <- c()
  vector_stock_return <- c()
  vector_stock_prices_event <- c()
  vector_stock_return_event <- c()
  
  if (length(which(Stock_Prices_DE$datadate == announcement_date &
                   Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)) ==
      0) {
    row <- NA
  } else {
    row <- which(Stock_Prices_DE$datadate == announcement_date &
                   Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)
  }
  
  if (sum(is.na(row) == 1)) {
    vector_stock_prices <- rep(NA, times = 179)
  } else {
    vector_stock_prices <- Stock_Prices_DE[(row - 218):(row - 39),
                                           7]
  }
  
  if (sum(is.na(vector_stock_prices)) > 40) {
    vector_stock_return <- list(rep(NA, times = 179))
  } else {
    vector_stock_return <- list(diff(vector_stock_prices)/
                                  vector_stock_prices[-length(vector_stock_prices)])
  }
  
  if (sum(is.na(row) == 1)) {
    vector_stock_prices_event <- rep(NA, times = 22)
  } else {
    vector_stock_prices_event <- Stock_Prices_DE[(row - 11):(row  
                                                               10), 7]
  }
  
  if (sum(is.na(vector_stock_prices_event)) > 0) {
    vector_stock_return_event <- list(rep(NA, times = 21))
  } else {
    vector_stock_return_event <- list(diff(vector_stock_prices_event)/
                                        vector_stock_prices_event[-length(vector_stock_prices_event)])
  }
  
  vector <- data.frame(cbind(peer, Deal_Nr, SIC, peer_country, vector_stock_return,
                             vector_stock_return_event))
  return(vector)
}


results_vector_germany <- lapply(1:nrow(peers_per_deal_germany), get_return_vector_germany, peers_per_deal_germany$peer_company, peers_per_deal_germany$current_SIC, peers_per_deal_germany$ISIN_code, peers_per_deal_germany$deal_nr, peers_per_deal_germany$current_announcement, peers_per_deal_germany$peer_country)

CodePudding user response:

try to do the task in parallel using mclapply.

  • Related