I have two data.frame
:
df1 <- data.frame(species = c("aa", "aa", "aa", "ab", "ab",
"ac", "ab", "aa", "ba", "bb"))
df2 <- data.frame(original.search = c("aa", "aa", "aa", "ab", "ab",
"ac", "ab", "aa", "ba", "bb"),
search.str = c("aa2", "aa2", "aa2", "ab2", "ab2",
"ac2", "ab2", "aa2", "ba2", "bb2"))
(Just for a reference, string values in each name vector are more like "Ocotea porosa", "Alchornea triplinervia" — I just used simple names to better exemplify)
What I'm trying to do is, for each match between species
from df1
and original.search
from df2
, substitute the value of species
in df1
by the search.str
value in df2
.
In the example, both data.frame
are the same size and the columns are already matched, but my real df1
has close to 100,000 rows and my df2
are the unique values found in df1$species
How to proceed?
EDIT: If no matches exist between species
and original.search
, then I need to keep the value from species
in the cell.
CodePudding user response:
Here is an option - create a named vector from the unique
rows of 'df2', and use that to match and replace the 'species' column
df1$species <- with(unique(df2), setNames(search.str,
original.search))[df1$species]
Regarding the non matches cases, we may use coalesce
after doing the same as above i.e. deframe
(from tibble
creates a named vector from the 'df2' uniqued, and then does the match and replace with original 'species' column and coalesce
with species
so that if there are any NA
s in the replaced column for a row, it gets replaced with the corresponding 'species' value
library(dplyr)
library(tibble)
df1 %>%
mutate(species = coalesce(deframe(unique(df2))[species], species))