mutate column using case_when for n% of the group-CodePudding

I have a data frame

df<-data.frame(id=rep(1:10,each=10),
               Room1=rnorm(100,0.4,0.5),
               Room2=rnorm(100,0.3,0.5),
               Room3=rnorm(100,0.7,0.5))

I want to mutate Room1 column by group (those in id = 10) using case_when:

data <- df %>%
  mutate(Room1 = case_when(
    id==10 ~ 0.6,
    TRUE ~ as.numeric(Room1)
))

But only for 20% of the rows for id=10. The 20% should be randomly assigned. Can anyone help? Thanks in advance

CodePudding user response：

Group by id, and use dplyr::percent_rank(runif(n())) <= .2 to select a random 20% of cases within id.

I assume you intend to add more conditions to your case_when() -- otherwise, you can use if_else() instead.

set.seed(13)
library(dplyr)  

data <- df %>%
  group_by(id) %>% 
  mutate(Room1 = case_when(
    id == 10 & percent_rank(runif(n())) <= .2 ~ 0.6,
    TRUE ~ Room1
  )) %>% 
  ungroup()

tail(data, 10)

# A tibble: 10 × 4
      id  Room1   Room2   Room3
   <int>  <dbl>   <dbl>   <dbl>
 1    10  0.590  0.801   0.745 
 2    10  0.117  0.517  -0.491 
 3    10 -0.207  0.533   2.15  
 4    10 -0.282 -0.249   0.828 
 5    10  0.6    0.605   0.778 
 6    10  0.272  0.308   0.0575
 7    10 -0.213  0.668   0.476 
 8    10  0.507  0.923  -0.0948
 9    10  0.434 -0.0663  0.0720
10    10  0.6    0.264   0.647

CodePudding user response：

A dplyr solution:

library(dplyr)  

df %>%
  group_by(id) %>%
  mutate(Room1 = case_when(
    id == 10 & sample(n()) <= n()*0.2 ~ 0.6,
    TRUE ~ Room1
  )) %>%
  ungroup()

Output

# A tibble: 100 × 4
       id   Room1   Room2   Room3
    <int>   <dbl>   <dbl>   <dbl>
...
 91    10  0.132  -0.595   0.390 
 92    10  0.258  -0.0995  0.580 
 93    10  0.239   0.503   0.960 
 94    10  0.6     0.789   0.744 
 95    10  0.878   0.308   1.21  
 96    10  1.24    0.523   1.73  
 97    10 -0.0795 -0.263   0.546 
 98    10  0.6    -0.224   0.695 
 99    10 -0.194   0.524  -0.167 
100    10  0.665   0.639  -0.0578