Home > front end >  Mutate column if column exists loop dplyr
Mutate column if column exists loop dplyr

Time:03-15

I want to do a simple task: mutate the value of one column if that column exists. I want to do this for multiple columns and data frames within a loop.

I made a reproducible example. In the example, we have two data frames Belg and Fran.

Fran has two columns t1 and t2 while Belg has three t1, t2, t3. I want to replace the values in these two or three columns based on a simple manipulation described in the code below. The tricky part is that because the number of columns is not the same on both data frames the code below generates a third column t3 to df Fran -- which is not what I want.

df <- data.frame (v1 = c("Fran", "Fran", "Fran", "Belg", "Belg", "Belg"),
                   v2 = c(1201, 1201, 1202, 1203, 1204, 1205),
                   v3 = c(1, 2, 1, 1, 3, 1)
)

Fran_t <- data.frame(v1 = c("Fran", "Fran"),
                  t1 = c(NA, NA), 
                  t2 = c(NA, NA)
)

Belg_t <- data.frame(v1 = c("Belg", "Belg", "Belg"),
                   t1 = c(NA, NA, NA), 
                   t2 = c(NA, NA, NA), 
                   t3 = c(NA, NA, NA)
)

cty <- c("Fran", "Belg")

for (c in cty) {
  a <- df
  d <- paste0(c,"_t")
  
  a <- a %>%
    filter(v1==paste(c)) %>%
    left_join(., get(d), by="v1") %>%
    group_by(v2) %>%
    mutate("t1" = case_when(any(v3==1) ~ "NAv", TRUE ~ "NAp")) %>%
    mutate("t2" = case_when(any(v3==2) ~ "NAv", TRUE ~ "NAp")) %>%
    mutate("t3" = case_when(any(v3==3) ~ "NAv", TRUE ~ "NAp")) 
  assign(c, a)
}

I have tried things like:

mutate({if("t1" %in% names(.)) "t1" = case_when(any(v3==1) ~ "NAv") else "NULL"}) 

or even 


{if("t1" %in% names(.)) mutate("t1" = case_when(any(v3==1) ~ "NAv", TRUE ~ "NAp")) else NULL}
``

CodePudding user response:

You can place Fran_t and Belg_t in a list. Note that this does not change the underlying frame Fran_t and Belg_t, but your original loop did not do that either; rather it created two new frames called Fran and Belg. This happened because of your assign(c,a) call. If you want to actually change Fran_t and Belg_t, see an option at the bottom:

cty <- list(Fran_t, Belg_t)

lapply(cty, function(c) {
  df%>%
    inner_join(c, by="v1") %>%
    group_by(v2) %>% 
    mutate(across(starts_with("t"), ~if_else(any(v3==1),"NAv", "NAp")))
})

Output:

[[1]]
# A tibble: 6 x 5
# Groups:   v2 [2]
  v1       v2    v3 t1    t2   
  <chr> <dbl> <dbl> <chr> <chr>
1 Fran   1201     1 NAv   NAv  
2 Fran   1201     1 NAv   NAv  
3 Fran   1201     2 NAv   NAv  
4 Fran   1201     2 NAv   NAv  
5 Fran   1202     1 NAv   NAv  
6 Fran   1202     1 NAv   NAv  

[[2]]
# A tibble: 9 x 6
# Groups:   v2 [3]
  v1       v2    v3 t1    t2    t3   
  <chr> <dbl> <dbl> <chr> <chr> <chr>
1 Belg   1203     1 NAv   NAv   NAv  
2 Belg   1203     1 NAv   NAv   NAv  
3 Belg   1203     1 NAv   NAv   NAv  
4 Belg   1204     3 NAp   NAp   NAp  
5 Belg   1204     3 NAp   NAp   NAp  
6 Belg   1204     3 NAp   NAp   NAp  
7 Belg   1205     1 NAv   NAv   NAv  
8 Belg   1205     1 NAv   NAv   NAv  
9 Belg   1205     1 NAv   NAv   NAv  

Option to change Fran_t and Belg_t:

cty <- c("Fran_t", "Belg_t")

for(c in cty) {
  assign(c, df%>%
    inner_join(get(c), by="v1") %>%
    group_by(v2) %>% 
    mutate(across(starts_with("t"), ~if_else(any(v3==1),"NAv", "NAp")))
  )
}
  • Related