I am trying to replace all values in nat_locx
with the value from the first row in LOCX
if multiple conditions are met once or more for id
(my group_by()
variable).
Here is an example of my data:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 13.5 2 487.90 26
6553 2004-07-14 13.5 13.5 0 43
6553 2004-07-15 13.5 12.5 30 44
6553 2004-07-25 13.5 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
The way I have tried to do this is like so:
df<-df %>%
group_by(id) %>%
mutate(nat_locx=ifelse(loc_age>25 & loc_age<40 & distance>30, first(LOCX), nat_locx))
However, when I do this, it only replaces the first row with the first value from the LOCX
column instead of all the nat_locx
values for my group_by
variable (id
).
Ideally, I'd like this output:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 2 2 487.90 26
6553 2004-07-14 2 13.5 0 43
6553 2004-07-15 2 12.5 30 44
6553 2004-07-25 2 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
A dplyr
solution is preferred.
CodePudding user response:
We may need replace
df %>%
group_by(id) %>%
mutate(nat_locx= replace(nat_locx, loc_age>25 & loc_age<40 & distance>30, first( nat_locx)))
CodePudding user response:
We could use a classic non vectorized if
else
statement:
df %>%
group_by(id) %>%
mutate(nat_locx=if (loc_age > 25 &
loc_age < 40 &
distance > 30) {
first(LOCX)
} else {
nat_locx
}
)
id DATE nat_locx LOCX distance loc_age
<int> <chr> <dbl> <dbl> <dbl> <int>
1 6553 2004-06-27 2 2 488. 26
2 6553 2004-07-14 2 13.5 0 43
3 6553 2004-07-15 2 12.5 30 44
4 6553 2004-07-25 2 14.5 44.6 54
5 6081 2004-07-05 13 14.2 40.2 44
6 6081 2004-07-20 13 13.8 61.8 49