Home > Enterprise >  Conditionally replace all records for group_by if condition is met once dplyr ifelse
Conditionally replace all records for group_by if condition is met once dplyr ifelse

Time:03-31

I am trying to replace all values in nat_locx with the value from the first row in LOCX if multiple conditions are met once or more for id (my group_by() variable).

Here is an example of my data:

  id         DATE       nat_locx  LOCX distance loc_age
 <fct>       <date>        <dbl> <dbl>    <dbl>   <dbl>
 6553        2004-06-27     13.5   2    487.90       26
 6553        2004-07-14     13.5  13.5    0          43
 6553        2004-07-15     13.5  12.5   30          44
 6553        2004-07-25     13.5  14.5   44.598      54
 6081        2004-07-05       13  14.2   40.249      44
 6081        2004-07-20       13  13.8   61.847      49

The way I have tried to do this is like so:

df<-df %>%
    group_by(id) %>%
    mutate(nat_locx=ifelse(loc_age>25 & loc_age<40 & distance>30, first(LOCX), nat_locx))

However, when I do this, it only replaces the first row with the first value from the LOCX column instead of all the nat_locx values for my group_by variable (id).

Ideally, I'd like this output:

  id         DATE       nat_locx  LOCX distance loc_age
 <fct>       <date>        <dbl> <dbl>    <dbl>   <dbl>
 6553        2004-06-27     2     2     487.90       26
 6553        2004-07-14     2     13.5    0          43
 6553        2004-07-15     2     12.5   30          44
 6553        2004-07-25     2     14.5   44.598      54 
 6081        2004-07-05     13    14.2   40.249      44
 6081        2004-07-20     13    13.8   61.847      49

A dplyr solution is preferred.

CodePudding user response:

We may need replace

df %>%
    group_by(id) %>%
    mutate(nat_locx= replace(nat_locx, loc_age>25 & loc_age<40 & distance>30, first( nat_locx)))

CodePudding user response:

We could use a classic non vectorized if else statement:

df %>%
  group_by(id) %>%
  mutate(nat_locx=if (loc_age > 25 & 
                      loc_age < 40 & 
                      distance > 30) {
    first(LOCX)
  } else {
    nat_locx
  }
  )
     id DATE       nat_locx  LOCX distance loc_age
  <int> <chr>         <dbl> <dbl>    <dbl>   <int>
1  6553 2004-06-27        2   2      488.       26
2  6553 2004-07-14        2  13.5      0        43
3  6553 2004-07-15        2  12.5     30        44
4  6553 2004-07-25        2  14.5     44.6      54
5  6081 2004-07-05       13  14.2     40.2      44
6  6081 2004-07-20       13  13.8     61.8      49
  • Related