Home > other >  Replace the values of one column with values of another column in R dplyr
Replace the values of one column with values of another column in R dplyr

Time:09-01

I have a data frame that looks like this. I want inside the dplyr pipeline to replace only the 7 first rows of the threshold column with the values that come from the manufacturer column.

library(tidyverse)

mpg %>% 
  arrange(cty) %>% 
  mutate(threshold=NA) %>% 
  select(manufacturer,cty, threshold)
#> # A tibble: 234 × 3
#>    manufacturer   cty threshold
#>    <chr>        <int> <lgl>    
#>  1 dodge            9 NA       
#>  2 dodge            9 NA       
#>  3 dodge            9 NA       
#>  4 dodge            9 NA       
#>  5 jeep             9 NA       
#>  6 chevrolet       11 NA       
#>  7 chevrolet       11 NA       
#>  8 chevrolet       11 NA       
#>  9 dodge           11 NA       
#> 10 dodge           11 NA       
#> # … with 224 more rows

Created on 2022-08-31 with reprex v2.0.2

I want my data to look like this

#>    manufacturer   cty threshold
#>    <chr>        <int> <lgl>    
#>  1 dodge            9 dodge      
#>  2 dodge            9 dodge       
#>  3 dodge            9 dodge       
#>  4 dodge            9 dodge      
#>  5 jeep             9 jeep      
#>  6 chevrolet       11 chevrolet        
#>  7 chevrolet       11 chevrolet       
#>  8 chevrolet       11 NA       
#>  9 dodge           11 NA       
#> 10 dodge           11 NA  

any help and guidance as always are highly appreciated

CodePudding user response:

USe case_when to create a logical condition with row_number() for replacement. In addition, there is no need to create a blank column i.e. the NAs can be filled by default in case_when

library(dplyr)
library(ggplot2)
mpg %>% 
  arrange(cty) %>%   
  select(manufacturer, cty) %>% 
  mutate(threshold = case_when(row_number() < 7 ~manufacturer))

-output

# A tibble: 234 × 3
   manufacturer   cty threshold
   <chr>        <int> <chr>    
 1 dodge            9 dodge    
 2 dodge            9 dodge    
 3 dodge            9 dodge    
 4 dodge            9 dodge    
 5 jeep             9 jeep     
 6 chevrolet       11 chevrolet
 7 chevrolet       11 <NA>     
 8 chevrolet       11 <NA>     
 9 dodge           11 <NA>     
10 dodge           11 <NA>     
# … with 224 more rows

CodePudding user response:

Here is version using row_number() with ifelse and %in%:

mpg %>% 
  arrange(cty) %>% 
  mutate(threshold=NA) %>% 
  select(manufacturer,cty, threshold) %>% 
  mutate(threshold = ifelse(row_number() %in% 1:7, manufacturer, threshold))
   manufacturer   cty threshold
   <chr>        <int> <chr>    
 1 dodge            9 dodge    
 2 dodge            9 dodge    
 3 dodge            9 dodge    
 4 dodge            9 dodge    
 5 jeep             9 jeep     
 6 chevrolet       11 chevrolet
 7 chevrolet       11 chevrolet
 8 chevrolet       11 NA       
 9 dodge           11 NA       
10 dodge           11 NA       
# ... with 224 more rows
# i Use `print(n = ...)` to see more rows

CodePudding user response:

Here's yet another version. Similar to the ones above, but using the function rowid_to_column().

library(dplyr)

mpg %>% 
  arrange(cty) %>%
  select(manufacturer, cty) %>%
  rowid_to_column() %>%
  mutate(threshold = ifelse(rowid < 8, manufacturer, NA))

# A tibble: 234 x 4
   rowid manufacturer   cty threshold
   <int> <chr>        <int> <chr>    
 1     1 dodge            9 dodge    
 2     2 dodge            9 dodge    
 3     3 dodge            9 dodge    
 4     4 dodge            9 dodge    
 5     5 jeep             9 jeep     
 6     6 chevrolet       11 chevrolet
 7     7 chevrolet       11 chevrolet
 8     8 chevrolet       11 NA       
 9     9 dodge           11 NA       
10    10 dodge           11 NA       
# ... with 224 more rows
# i Use `print(n = ...)` to see more rows
  • Related