I have the following datasets and information: first, I have i different plots that I want to analyze. In each plot, i have j species that I want to obtain some information, such as:
plot1 = c(rep(1, 3), rep(2, 4), rep(3, 5))
spp1 = c('a', 'b', 'c', 'a', 'b', 'c', 'd', 'b', 'b', 'b', 'e', 'f')
data.1 = data.frame(plot1, spp1)
The above mentioned information repeats for a second dataframe of similar structure:
plot2 = c(rep 1, 2), rep(2, 3), rep(3, 5))
spp2 = c('a', 'a', 'b', 'c', 'c', 'b', 'b', 'b', 'e', 'f'))
data.2 = data.frame(plot2, spp2)
What I'm trying to do is, for each i plot, setdiff(unique(data.1$spp1), unique(data.2$spp2))
and add the obtained information to a dataframe that has 2 columns: plot and spp_name
For the example datasets I'd like to obtain a final dataframe such as:
df_result = data.frame(plot = c(1,1,2,2,3), spp_name = ('b','c','a','d',0)
0 (or similar) must be returned when the setdiff(unique()) returns 'character(0)', So, in a way, my df_result
needs to have, for each i plot, length equal to the number of setdiff strings between data.1$spp1 and data.2$spp2.
The first thing I did was using a for loop based on each i plot. Getting to setdiff() string result is ok to but I don't know how to add this information to a empty dataframe...do I need to loop something for each species? I really hope my question is comprehensible.
Thanks already
CodePudding user response:
You could use anti_join
and add rows for the missing values:
library(dplyr)
anti_join(data.1, data.2, by = c("plot1" = "plot2", "spp1" = "spp2")) %>%
add_row(plot1 = setdiff(data.1$plot1, .$plot1))
# plot1 spp1
#1 1 b
#2 1 c
#3 2 a
#4 2 d
#5 3 <NA>