Home > Blockchain >  R Populate column based on matching rows values in two different data frames
R Populate column based on matching rows values in two different data frames


I have two different data frames 'df1' and 'df2' with six matching column names. I want to scan df2 for rows that match exactly in df1, and if they do enter a 1 in the 'detect' column of df1 and if not enter a 0 in that column. Currently all values of 'detect' in df1 are 0's, but I want those to change to 1 when there's an exact match between the two data frames. It would look like this:


site ddate ssegment spp vtype tperiod detect
BMA 6/1/2021 1 AMRO Song 1 0
BMC 6/15/2021 1 WISN Drum 1 0
BMA 6/15/2021 1 NOFL Song 2 0
BMC 6/29/2021 2 AMRO Call 1 0
BMA 6/29/2021 2 WISN Call 2 0


site ddate ssegment spp vtype tperiod
BMA 6/1/2021 1 AMRO Call 1
BMC 6/15/2021 1 WISN Drum 1
BMA 6/15/2021 1 NOFL Song 2
BMC 6/29/2021 2 AMRO Drum 1
BMA 6/29/2021 2 WISN Call 2

After scanning these, df1 would now look like:


site ddate ssegment spp vtype tperiod detect
BMA 6/1/2021 1 AMRO Song 1 0
BMC 6/15/2021 1 WISN Drum 1 1
BMA 6/15/2021 1 NOFL Song 2 1
BMC 6/29/2021 2 AMRO Call 1 0
BMA 6/29/2021 2 WISN Call 2 1

I was thinking that R base function 'merge' might be useful, but I can't quite figure it out. Thank you for your help!

CodePudding user response:

Start with the detect column only in df2, then merge:

df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)

This will create the detect column as 1s when there are exact matches and NAs when there are not. If you want, you can change the NAs to 0s.

The same method can work with dplyr:

df1 %>% 
  select(-detect) %>%
    df2 %>% mutate(detect = 1) %>% unique)

CodePudding user response:

There is anti_join and semi_join for filter joining of two tables:


df1 <- tribble(
  ~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod, ~detect,
  "BMA",  "6/1/2021",        1L, "AMRO", "Song",       1L,      0L,
  "BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,      0L,
  "BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,      0L,
  "BMC", "6/29/2021",        2L, "AMRO", "Call",       1L,      0L,
  "BMA", "6/29/2021",        2L, "WISN", "Call",       2L,      0L

df2 <- tibble::tribble(
~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod,
"BMA",  "6/1/2021",        1L, "AMRO", "Call",       1L,
"BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,
"BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,
"BMC", "6/29/2021",        2L, "AMRO", "Drum",       1L,
"BMA", "6/29/2021",        2L, "WISN", "Call",       2L

  df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
  df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#>   site  ddate     ssegment spp   vtype tperiod detect
#>   <chr> <chr>        <int> <chr> <chr>   <int>  <dbl>
#> 1 BMA   6/1/2021         1 AMRO  Song        1      0
#> 2 BMC   6/29/2021        2 AMRO  Call        1      0
#> 3 BMC   6/15/2021        1 WISN  Drum        1      1
#> 4 BMA   6/15/2021        1 NOFL  Song        2      1
#> 5 BMA   6/29/2021        2 WISN  Call        2      1

Created on 2021-12-08 by the reprex package (v2.0.1)

CodePudding user response:

Please find one possible and very simple solution with the data.table library


  • Code


df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
  • Output

#>    site     ddate ssegment  spp vtype tperiod detect
#> 1:  BMA  6/1/2021        1 AMRO  Song       1      0
#> 2:  BMC 6/15/2021        1 WISN  Drum       1      1
#> 3:  BMA 6/15/2021        1 NOFL  Song       2      1
#> 4:  BMC 6/29/2021        2 AMRO  Call       1      0
#> 5:  BMA 6/29/2021        2 WISN  Call       2      1

Created on 2021-12-08 by the reprex package (v2.0.1)

  • Related