Home > front end >  Apply function to one random row per group (in specified set of groups)
Apply function to one random row per group (in specified set of groups)

Time:09-23

I have the following data frame df = data.frame(name = c("abc", "abc", "abc", "def", "def", "ghi", "ghi", "jkl", "jkl", "jkl", "jkl", "jkl"), ignore = c(0,1,0,0,1,1,1,0,0,0,1,1), time = 31:42)

name | ignore | time |
-----|--------|------|
abc  | 0      | 31   |
abc  | 1      | 32   |
abc  | 0      | 33   |
def  | 0      | 34   |
def  | 1      | 35   |
ghi  | 1      | 36   |
ghi  | 1      | 37   |
jkl  | 0      | 38   |
jkl  | 0      | 39   |
jkl  | 0      | 40   |
jkl  | 1      | 41   |
jkl  | 1      | 42   |

and I want to do the following:

  1. Group by name
  2. If ignore is all non-zero in a group, leave the time values as is for this group
  3. If ignore contains at least one zero in a group (e.g. where name is jkl), randomly choose one of the rows in this group where ignore is zero, and apply a function f to the time value.

More specifically, for example if f(x) = x - 30 then I would expect to see something like this:

name | ignore | time |
-----|--------|------|
abc  | 0      | 1    | <- changed
abc  | 1      | 32   |
abc  | 0      | 33   |
def  | 0      | 4    | <- changed
def  | 1      | 35   |
ghi  | 1      | 36   | <- unchanged group
ghi  | 1      | 37   | <- unchanged group
jkl  | 0      | 38   |
jkl  | 0      | 39   |
jkl  | 0      | 10   | <- changed
jkl  | 1      | 41   |
jkl  | 1      | 42   |

I'm finding it hard to get an elegant solution to this. I am not sure how to apply a function to randomly selected rows within a group, nor what the best approach is for only applying a function to selected groups. I would ideally like to solve this via dplyr, but no problem if not.

CodePudding user response:

f <- function(x) x - 30
df %>% 
  group_by(name) %>% 
  mutate(samp = if(any(ignore == 0)) sample(which(ignore == 0), 1) else F,
         time = ifelse(row_number() != samp, time, f(time))) %>% 
  select(-samp)

output

   name  ignore  time
   <chr>  <dbl> <dbl>
 1 abc        0     1
 2 abc        1    32
 3 abc        0    33
 4 def        0     4
 5 def        1    35
 6 ghi        1    36
 7 ghi        1    37
 8 jkl        0     8
 9 jkl        0    39
10 jkl        0    40
11 jkl        1    41
12 jkl        1    42
  • Related