Home > Mobile >  How can I use a vectorised function to multiply all values in a different data frame for a given ID
How can I use a vectorised function to multiply all values in a different data frame for a given ID

Time:02-21

I have a huge dataset with 750,000 IDs, for which I want to aggregate monthly values to yearly values by multiplying all values for a given ID. The ID consists of a combination of an identification number and a year.

The data I want to extract:

ID monthly value
1 - 1997 Product of Monthly Values in Year 1997
1 - 1998 Product of Monthly Values in Year 1998
1 - 1999 Product of Monthly Values in Year 1999
... ...
2 - 1997 Product of Monthly Values in Year 1997
2 - 1998 Product of Monthly Values in Year 1998
2 - 1999 Product of Monthly Values in Year 1999
... ...

The dataset which is the source:

ID monthly value
1 - 1997 Monthly Value 1 in Year 1997
1 - 1997 Monthly Value 2 in Year 1997
1 - 1997 Monthly Value 3 in Year 1997
... ...
2 - 1997 Monthly Value 1 in Year 1997
2 - 1997 Monthly Value 2 in Year 1997
2 - 1997 Monthly Value 3 in Year 1997
... ...

I have written a for loop, which takes about 0.74s for 10 IDs, which is way to slow. It would take about 15 hours for the whole data to run through. The for loop multiplies all monthly values for a given ID and stores it in a separate data frame.

for (i in 1:nrow(yearlyreturns)){
  
  yearlyreturns[i, "yret"] <- prod(monthlyreturns[monthlyreturns$ID == yearlyreturns[i,"ID"],"change"]) - 1
  yearlyreturns[i, "monthcount"] <- length(monthlyreturns[monthlyreturns$ID == yearlyreturns[i,"ID"],"change"])
  
}

I don't know how to get from here to a vectorised function, which takes less time.

Is this possible to do in R?

CodePudding user response:

Something like this:

library(dplyr)

df %>% 
  mutate(monthly_value = paste("Product of", str_replace(monthly_value, 'Value\\s\\d', 'Values'))) %>% 
  group_by(ID, monthly_value) %>% 
  summarise()
  ID       monthly_value                         
  <chr>    <chr>                                 
1 1 - 1997 Product of Monthly Values in Year 1997
2 2 - 1997 Product of Monthly Values in Year 1997

data:

structure(list(ID = c("1 - 1997", "1 - 1997", "1 - 1997", "2 - 1997", 
"2 - 1997", "2 - 1997"), monthly_value = c("Monthly Value 1 in Year 1997", 
"Monthly Value 2 in Year 1997", "Monthly Value 3 in Year 1997", 
"Monthly Value 1 in Year 1997", "Monthly Value 2 in Year 1997", 
"Monthly Value 3 in Year 1997")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Based on the for loop code, this may be a done with a join

library(data.table)
setDT(yearlyreturns)[monthlyreturns, c("yret", "monthcount") 
     := .(prod(change) -1, .N), on = .(ID), by = .EACHI]
  • Related