I have the following data frame df = data.frame(name = c("abc", "abc", "abc", "def", "def", "ghi", "ghi", "jkl", "jkl", "jkl", "jkl", "jkl"), ignore = c(0,1,0,0,1,1,1,0,0,0,1,1), time = 31:42)
name | ignore | time |
-----|--------|------|
abc | 0 | 31 |
abc | 1 | 32 |
abc | 0 | 33 |
def | 0 | 34 |
def | 1 | 35 |
ghi | 1 | 36 |
ghi | 1 | 37 |
jkl | 0 | 38 |
jkl | 0 | 39 |
jkl | 0 | 40 |
jkl | 1 | 41 |
jkl | 1 | 42 |
and I want to do the following:
- Group by
name
- If
ignore
is all non-zero in a group, leave thetime
values as is for this group - If
ignore
contains at least one zero in a group (e.g. wherename
isjkl
), randomly choose one of the rows in this group whereignore
is zero, and apply a functionf
to thetime
value.
More specifically, for example if f(x) = x - 30
then I would expect to see something like this:
name | ignore | time |
-----|--------|------|
abc | 0 | 1 | <- changed
abc | 1 | 32 |
abc | 0 | 33 |
def | 0 | 4 | <- changed
def | 1 | 35 |
ghi | 1 | 36 | <- unchanged group
ghi | 1 | 37 | <- unchanged group
jkl | 0 | 38 |
jkl | 0 | 39 |
jkl | 0 | 10 | <- changed
jkl | 1 | 41 |
jkl | 1 | 42 |
I'm finding it hard to get an elegant solution to this. I am not sure how to apply a function to randomly selected rows within a group, nor what the best approach is for only applying a function to selected groups. I would ideally like to solve this via dplyr, but no problem if not.
CodePudding user response:
f <- function(x) x - 30
df %>%
group_by(name) %>%
mutate(samp = if(any(ignore == 0)) sample(which(ignore == 0), 1) else F,
time = ifelse(row_number() != samp, time, f(time))) %>%
select(-samp)
output
name ignore time
<chr> <dbl> <dbl>
1 abc 0 1
2 abc 1 32
3 abc 0 33
4 def 0 4
5 def 1 35
6 ghi 1 36
7 ghi 1 37
8 jkl 0 8
9 jkl 0 39
10 jkl 0 40
11 jkl 1 41
12 jkl 1 42