Home > front end >  Looking to optimize a mutate(case_when( )) function in R, with the same mutations required across a
Looking to optimize a mutate(case_when( )) function in R, with the same mutations required across a

Time:01-31

I am trying to find a vectorized, general way to accomplish a task in R for which I typically use Stata. I have a list of variables which have a common structure, and I am hoping to apply the same transformations to all of them efficiently. There are two versions of each variable (x and m), each of which only have a value (some numeric, some char) if the observation is in group 1 or 2, respectively. I am trying to create a general variable which contains only the corresponding non-NA value for the row. In Stata, I would accomplish this using the below code:

foreach word in cost cost_add25 cost_less25 parkingaccess parkingaccess_add25 parkingaccess_less25 timeresult timeresult_add25 timeresult_less25 timetravel timetravel_add25 timetravel_less25 timewait timewait_add25 timewait_less25 freeparking onlinesch service service_less1 service_plus1 pcprec specialty {
    gen statusquo_`word' = statusquo_x_`word' if group == 1
    replace statusquo_`word' = statusquo_m_`word' if group == 2
}

However I have so far been stumped on how to efficiently do this in R with a vectorized function-- I have tried using purrr:map, lapply, and mapply, but have come up short. Here is my current implementation, which is pretty gross:

database <- database %>%
  mutate(
    statusquo_cost                 = case_when(group == 1 ~ statusquo_x_cost                 , group == 2 ~ statusquo_m_cost                 ),
    statusquo_cost_add25           = case_when(group == 1 ~ statusquo_x_cost_add25           , group == 2 ~ statusquo_m_cost_add25           ),
    statusquo_cost_less25          = case_when(group == 1 ~ statusquo_x_cost_less25          , group == 2 ~ statusquo_m_cost_less25          ),
    statusquo_parkingaccess        = case_when(group == 1 ~ statusquo_x_parkingaccess        , group == 2 ~ statusquo_m_parkingaccess        ),
    statusquo_parkingaccess_add25  = case_when(group == 1 ~ statusquo_x_parkingaccess_add25  , group == 2 ~ statusquo_m_parkingaccess_add25  ),
    statusquo_parkingaccess_less25 = case_when(group == 1 ~ statusquo_x_parkingaccess_less25 , group == 2 ~ statusquo_m_parkingaccess_less25 ),
    statusquo_timeresult           = case_when(group == 1 ~ statusquo_x_timeresult           , group == 2 ~ statusquo_m_timeresult           ),
    statusquo_timeresult_add25     = case_when(group == 1 ~ statusquo_x_timeresult_add25     , group == 2 ~ statusquo_m_timeresult_add25     ),
    statusquo_timeresult_less25    = case_when(group == 1 ~ statusquo_x_timeresult_less25    , group == 2 ~ statusquo_m_timeresult_less25    ),
    statusquo_timetravel           = case_when(group == 1 ~ statusquo_x_timetravel           , group == 2 ~ statusquo_m_timetravel           ),
    statusquo_timetravel_add25     = case_when(group == 1 ~ statusquo_x_timetravel_add25     , group == 2 ~ statusquo_m_timetravel_add25     ),
    statusquo_timetravel_less25    = case_when(group == 1 ~ statusquo_x_timetravel_less25    , group == 2 ~ statusquo_m_timetravel_less25    ),
    statusquo_timewait             = case_when(group == 1 ~ statusquo_x_timewait             , group == 2 ~ statusquo_m_timewait             ),
    statusquo_timewait_add25       = case_when(group == 1 ~ statusquo_x_timewait_add25       , group == 2 ~ statusquo_m_timewait_add25       ),
    statusquo_timewait_less25      = case_when(group == 1 ~ statusquo_x_timewait_less25      , group == 2 ~ statusquo_m_timewait_less25      ),
    statusquo_freeparking          = case_when(group == 1 ~ statusquo_x_freeparking          , group == 2 ~ statusquo_m_freeparking          ),
    statusquo_onlinesch            = case_when(group == 1 ~ statusquo_x_onlinesch            , group == 2 ~ statusquo_m_onlinesch            ),
    statusquo_service              = case_when(group == 1 ~ statusquo_x_service              , group == 2 ~ statusquo_m_service              ),
    statusquo_service_less1        = case_when(group == 1 ~ statusquo_x_service_less1        , group == 2 ~ statusquo_m_service_less1        ),
    statusquo_service_plus1        = case_when(group == 1 ~ statusquo_x_service_plus1        , group == 2 ~ statusquo_m_service_plus1        ),
    statusquo_pcprec               = case_when(group == 1 ~ statusquo_x_pcprec               , group == 2 ~ statusquo_m_pcprec               ),
    statusquo_specialty            = case_when(group == 1 ~ statusquo_x_specialty            , group == 2 ~ statusquo_m_specialty            )
    )   

In an ideal world, I would have something like this, where "VAR" is cycled through the values in the above list, however R is very type-sensitive and treats the new variable name as text.

vars <- c("cost","cost_add25","cost_less25","parkingaccess","parkingaccess_add25","parkingaccess_less25","timeresult","timeresult_add25","timeresult_less25","timetravel","timetravel_add25","timetravel_less25","timewait","timewait_add25","timewait_less25","freeparking","onlinesch","service","service_less1","service_plus1","pcprec","specialty")

database <- database %>%
  mutate(
    statusquo_VAR = case_when(group == 1 ~ statusquo_x_VAR, group == 2 ~ statusquo_m_VAR)
  )  

Any suggestions would be very helpful! I can provide a data sample if needed, though I am not sure this question is complicated enough to need one.

I tried to use a vectorized operation to combine a list of two variables together, based on a structured list of their names, and am unable to do so without a cumbersome approach.

CodePudding user response:

I have modified your code, hopefully it will get what you wanted. Another minor point, I used ifelse to replace case_when, either should work though.

vars <- c("cost","cost_add25","cost_less25","parkingaccess","parkingaccess_add25","parkingaccess_less25","timeresult","timeresult_add25","timeresult_less25","timetravel","timetravel_add25","timetravel_less25","timewait","timewait_add25","timewait_less25","freeparking","onlinesch","service","service_less1","service_plus1","pcprec","specialty")

for (i in seq_along(vars)){
database <- database %>%
  mutate(
    !!paste0("statusquo_", vars[i]) := ifelse(group == 1, get(paste0("statusquo_x_", vars[i])), get(paste0("statusquo_m_", vars[i])))
  ) 
}

Edit:

With newer version of dplyr >= 1.0

for (i in vars){
  database <- database %>%
    mutate(
      "statusquo_{i}" := ifelse(group == 1, get(paste0("statusquo_x_", i)), get(paste0("statusquo_m_", i)))
    ) 
}
  • Related