Home > Back-end >  How to avoid recycling while trying to replace values from a vector in a dataframe column
How to avoid recycling while trying to replace values from a vector in a dataframe column


This question arose, while working on this question Replace list names if they exist

I have this manipulated iris dataset with two vectors:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
         <dbl>       <dbl>        <dbl>       <dbl> <chr>     
1          5.1         3.5          1.4         0.2 setosa    
2          4.9         3            1.4         0.2 setosa    
3          7           3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6           2.5 virginica 
6          5.8         2.7          5.1         1.9 virginica

I would like to replace values in Species selected from a vector (to_select) with values from another vector (new_name)

When I do:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species)) %>% 
  mutate(Species = ifelse(Species %in% to_select, new_name, Species))

# I get:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
         <dbl>       <dbl>        <dbl>       <dbl> <chr>        
1          5.1         3.5          1.4         0.2 new_setoas   
2          4.9         3            1.4         0.2 **new_virginica** # should be new_setoas
3          7           3.2          4.7         1.4 versicolor   
4          6.4         3.2          4.5         1.5 versicolor   
5          6.3         3.3          6           2.5 **new_setoas** # should be new_virginica   
6          5.8         2.7          5.1         1.9 new_virginica 

While I know this is happening because of recycling. I don't know how to avoid this!

CodePudding user response:

We may use recode - instead of grouping and then modifying the group column afterwards, it can be done at the group_by step itself

iris %>% 
  group_by(Species =  recode(as.character(Species),
     !!!setNames(new_name, to_select))) %>% 


# A tibble: 6 × 5
# Groups:   Species [3]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
         <dbl>       <dbl>        <dbl>       <dbl> <chr>        
1          5.1         3.5          1.4         0.2 new_setoas   
2          4.9         3            1.4         0.2 new_setoas   
3          7           3.2          4.7         1.4 versicolor   
4          6.4         3.2          4.5         1.5 versicolor   
5          6.3         3.3          6           2.5 new_virginica
6          5.8         2.7          5.1         1.9 new_virginica

CodePudding user response:

A solution with match is more complicated than akrun's solution but here it goes.


new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")

iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species)) %>% 
  mutate(i_new = match(Species, to_select)) %>%
  mutate(Species = ifelse(is.na(i_new), Species, new_name[i_new])) %>%
#> # A tibble: 6 × 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>        
#> 1          5.1         3.5          1.4         0.2 new_setoas   
#> 2          4.9         3            1.4         0.2 new_setoas   
#> 3          7           3.2          4.7         1.4 versicolor   
#> 4          6.4         3.2          4.5         1.5 versicolor   
#> 5          6.3         3.3          6           2.5 new_virginica
#> 6          5.8         2.7          5.1         1.9 new_virginica

Created on 2022-11-04 with reprex v2.0.2

  • Related