Home > Blockchain >  if condition is met replace all but not if another condition is met dplyr in R
if condition is met replace all but not if another condition is met dplyr in R

Time:04-01

I am trying to replace all values in nat_locx with the value from the first row in LOCX and replace all the distance values with 0 if my first condition is met one or more times for id (my group_by() variable), but NOT if my second condition is met one or more times for id.

Here is an example of my data:

  id         DATE       nat_locx  LOCX distance loc_age  condition
 <fct>       <date>        <dbl> <dbl>    <dbl>   <dbl>  <lgl>
 6553        2004-06-27     13.5   2    487.90       26  TRUE
 6553        2004-07-14     13.5  13.5    0          43  FALSE
 6553        2004-07-15     13.5  12.5   30          44  FALSE  
 10160       2005-07-01      4.5    12 229.45588     36  TRUE          
 10160       2005-07-05      4.5    11 200.12496     40  TRUE     
 10160       2005-07-06      4.5    11 200.12496     41  TRUE

The way I have tried to do this is like so:

df<-df %>%
  group_by(id) %>%
  mutate(condition = case_when(     
    loc_age >= 25 & loc_age < 40 & distance > 30 ~ TRUE, 
    loc_age>=40 & loc_age<50 & distance>60 ~ TRUE,
    TRUE ~ FALSE)) %>%  
mutate(nat_locx=if(condition=="TRUE") {
        first(LOCX) & distance==0.00
        } else {
        nat_locx})

The first mutate() results in a new column with TRUE and FALSE values. If there is even one instance of FALSE, then the if else statement I write afterwards should not proceed.

In this example, this would mean that for id==6553 the loop should not change anything. But, because condition==TRUE for every row for id==10160 then the if else should proceed.

Ideally, I'd like this output:

  id         DATE       nat_locx  LOCX distance loc_age  condition
 <fct>       <date>        <dbl> <dbl>    <dbl>   <dbl>  <lgl>
 6553        2004-06-27     13.5   2    487.90       26  TRUE
 6553        2004-07-14     13.5  13.5    0          43  FALSE
 6553        2004-07-15     13.5  12.5   30          44  FALSE  
 10160       2005-07-01      12    12     0          36  TRUE          
 10160       2005-07-05      12    11     0          40  TRUE     
 10160       2005-07-06      12    11     0          41  TRUE

A dplyr solution is preferred.

CodePudding user response:

As @Ben mentioned, we can include all so that the changes are only applied to groups that have all TRUE. We can use this for both nat_locx and for the distance columns.

library(tidyverse)


df %>%
  group_by(id) %>%
  mutate(
    condition = case_when(
      loc_age >= 25 & loc_age < 40 & distance > 30 ~ TRUE,
      loc_age >= 40 & loc_age < 50 & distance > 60 ~ TRUE,
      TRUE ~ FALSE
    )
  ) %>%
  mutate(nat_locx = if (all(condition)) first(LOCX) else nat_locx,
         distance = if (all(condition)) 0 else distance)

Output

     id DATE       nat_locx  LOCX distance loc_age condition
  <int> <chr>         <dbl> <dbl>    <dbl>   <int> <lgl>    
1  6553 2004-06-27     13.5   2       488.      26 TRUE     
2  6553 2004-07-14     13.5  13.5       0       43 FALSE    
3  6553 2004-07-15     13.5  12.5      30       44 FALSE    
4 10160 2005-07-01     12    12         0       36 TRUE     
5 10160 2005-07-05     12    11         0       40 TRUE     
6 10160 2005-07-06     12    11         0       41 TRUE     
  • Related