Home > front end >  How to conditionally update a R tibble using multiple conditions of another tibble
How to conditionally update a R tibble using multiple conditions of another tibble

Time:04-24

I have two tables. I would like to update the first table using a second table using multiple conditions. In base R I would use if...else type constructs to do this but would like to know how to achieve this using dplyr.

The table to be updated (have a field added) looks like this:

> Intvs
# A tibble: 12 x 3
   Group  From    To
   <chr> <dbl> <dbl>
 1 A         0     1
 2 A         1     2
 3 A         2     3
 4 A         3     4
 5 A         4     5
 6 A         5     6
 7 B         0     1
 8 B         1     2
 9 B         2     3
10 B         3     4
11 B         4     5
12 B         5     6

The tibble that I would like to use to make the update looks like this:

 >Zns
# A tibble: 2 x 4
  Group From     To  Zone
  <chr> <chr> <dbl> <dbl>
1 A     X         1     5
2 B     Y         3     4

I would like to update the Intvs tibble with the Zns tibble using the fields == Group, >= From, and <= To to control the update. The expected output should look like this

> Intvs
# A tibble: 12 x 4
   Group  From    To  Zone
   <chr> <dbl> <dbl> <chr>
 1 A         0     1  X
 2 A         1     2  X
 3 A         2     3  X
 4 A         3     4  X
 5 A         4     5  X
 6 A         5     6  NA
 7 B         0     1  NA
 8 B         1     2  NA
 9 B         2     3  NA
10 B         3     4  Y
11 B         4     5  NA
12 B         5     6  NA

What is the most efficient way to do this using dplyr?

The code below should make the dummy tables Intv and Zns

# load packages
require(tidyverse)

# Intervals table
a <- c(rep("A", 6), rep("B", 6))
b <- c(seq(0,5,1), seq(0,5,1) )
c <- c(seq(1,6,1), seq(1,6,1))
Intvs <- bind_cols(a, b, c) 
names(Intvs) <- c("Group", "From", "To")

# Zones table
a <- c("A", "B")
b <- c("X", "Y")
c <- c(1, 3)
d <- c(5, 4)
Zns <- bind_cols(a, b, c, d) 
names(Zns) <- c("Group", "From", "To", "Zone")

CodePudding user response:

This is the closest I get. It is not giving the expected output:

library(dplyr)
left_join(Intvs, Zns, by="Group") %>% 
  group_by(Group) %>% 
  mutate(Zone1 = case_when(From.x <= Zone & From.x >= To.y ~ From.y)) %>% 
  select(Group, From=From.x, To=To.x, Zone = Zone1)
   Group  From    To Zone 
   <chr> <dbl> <dbl> <chr>
 1 A         0     1 NA   
 2 A         1     2 X    
 3 A         2     3 X    
 4 A         3     4 X    
 5 A         4     5 X    
 6 A         5     6 X    
 7 B         0     1 NA   
 8 B         1     2 NA   
 9 B         2     3 NA   
10 B         3     4 Y    
11 B         4     5 Y    
12 B         5     6 NA 

CodePudding user response:

Not sure why your first row does not give NA, since 0 - 1 is not in the range of 1 - 5.

First left_join the two dataframes using the Group column. Here I assign the suffix "_Zns" to values from the Zns dataframe. Then use a single case_when or (ifelse) statement to assign NA to rows that do not fit the range. Finally, drop the columns that end with Zns.

library(dplyr)

left_join(Intvs, Zns, by = "Group", suffix = c("", "_Zns")) %>% 
  mutate(Zone = case_when(From >= From_Zns & To <= To_Zns ~ Zone,
                           TRUE ~ NA_character_)) %>% 
  select(-ends_with("Zns"))

# A tibble: 12 × 4
   Group  From    To Zone 
   <chr> <dbl> <dbl> <chr>
 1 A         0     1 NA   
 2 A         1     2 X    
 3 A         2     3 X    
 4 A         3     4 X    
 5 A         4     5 X    
 6 A         5     6 NA   
 7 B         0     1 NA   
 8 B         1     2 NA   
 9 B         2     3 NA   
10 B         3     4 Y    
11 B         4     5 NA   
12 B         5     6 NA   

Data

Note that I have changed your column name order in the Zns dataframe.

a <- c(rep("A", 6), rep("B", 6))
b <- c(seq(0,5,1), seq(0,5,1) )
c <- c(seq(1,6,1), seq(1,6,1))
Intvs <- bind_cols(a, b, c) 
names(Intvs) <- c("Group", "From", "To")

# Zones table
a <- c("A", "B")
b <- c("X", "Y")
c <- c(1, 3)
d <- c(5, 4)
Zns <- bind_cols(a, b, c, d)
colnames(Zns) <- c("Group", "Zone", "From", "To")

CodePudding user response:

Using non-equi join from data.table

library(data.table)
setDT(Intvs)[Zns, Zone := Zone, on = .(Group, From >= From, To <= To)]

-output

> Intvs
     Group  From    To   Zone
    <char> <num> <num> <char>
 1:      A     0     1   <NA>
 2:      A     1     2      X
 3:      A     2     3      X
 4:      A     3     4      X
 5:      A     4     5      X
 6:      A     5     6   <NA>
 7:      B     0     1   <NA>
 8:      B     1     2   <NA>
 9:      B     2     3   <NA>
10:      B     3     4      Y
11:      B     4     5   <NA>
12:      B     5     6   <NA>
  • Related