I have two different data frames 'df1' and 'df2' with six matching column names. I want to scan df2 for rows that match exactly in df1, and if they do enter a 1 in the 'detect' column of df1 and if not enter a 0 in that column. Currently all values of 'detect' in df1 are 0's, but I want those to change to 1 when there's an exact match between the two data frames. It would look like this:
df1
site | ddate | ssegment | spp | vtype | tperiod | detect |
---|---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 0 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 0 |
BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 | 0 |
df2
site | ddate | ssegment | spp | vtype | tperiod |
---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Call | 1 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 |
BMC | 6/29/2021 | 2 | AMRO | Drum | 1 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 |
After scanning these, df1 would now look like:
df1
site | ddate | ssegment | spp | vtype | tperiod | detect |
---|---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 1 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 1 |
BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 | 1 |
I was thinking that R base function 'merge' might be useful, but I can't quite figure it out. Thank you for your help!
CodePudding user response:
Start with the detect
column only in df2
, then merge:
df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)
This will create the detect
column as 1s when there are exact matches and NA
s when there are not. If you want, you can change the NA
s to 0s.
The same method can work with dplyr
:
library(dplyr)
df1 %>%
select(-detect) %>%
left_join(
df2 %>% mutate(detect = 1) %>% unique)
)
CodePudding user response:
There is anti_join
and semi_join
for filter joining of two tables:
library(tidyverse)
df1 <- tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod, ~detect,
"BMA", "6/1/2021", 1L, "AMRO", "Song", 1L, 0L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L, 0L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L, 0L,
"BMC", "6/29/2021", 2L, "AMRO", "Call", 1L, 0L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L, 0L
)
df2 <- tibble::tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod,
"BMA", "6/1/2021", 1L, "AMRO", "Call", 1L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L,
"BMC", "6/29/2021", 2L, "AMRO", "Drum", 1L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L
)
bind_rows(
df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#> site ddate ssegment spp vtype tperiod detect
#> <chr> <chr> <int> <chr> <chr> <int> <dbl>
#> 1 BMA 6/1/2021 1 AMRO Song 1 0
#> 2 BMC 6/29/2021 2 AMRO Call 1 0
#> 3 BMC 6/15/2021 1 WISN Drum 1 1
#> 4 BMA 6/15/2021 1 NOFL Song 2 1
#> 5 BMA 6/29/2021 2 WISN Call 2 1
Created on 2021-12-08 by the reprex package (v2.0.1)
CodePudding user response:
Please find one possible and very simple solution with the data.table
library
Reprex
- Code
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
- Output
#> site ddate ssegment spp vtype tperiod detect
#> 1: BMA 6/1/2021 1 AMRO Song 1 0
#> 2: BMC 6/15/2021 1 WISN Drum 1 1
#> 3: BMA 6/15/2021 1 NOFL Song 2 1
#> 4: BMC 6/29/2021 2 AMRO Call 1 0
#> 5: BMA 6/29/2021 2 WISN Call 2 1
Created on 2021-12-08 by the reprex package (v2.0.1)