I'm trying to merge two data frames together in R. One is a list of unique places and their accompanying attributes that all have a unique identifier for those places (Dataframe 1). The other is a list of species that come from a smaller subset of those places (dataframe 2). In DF2, there's lots of species that come from the same place (50 species from ~100 places). I'm trying to merge DF1 and DF2 so that each species has the appropriate location information assigned to it.
The columns in DF1 are a mix of Factors (1st col) and numbers
DF1
NHDPlusID StreamLeve StreamOrde StreamCalc
60003700019787 5 1 1
60003700019816 5 2 2
60003700019822 5 1 1
60003700019844 4 1 1
60003700019882 5 1 1
DF2 is also a mix of Factors, numbers, and characters
DF2
NHDPlusID Spec.
60003700019787 A
60003700019816 B
60003700019882 A
60003700019822 C
60003700019787 C
60003700019844 B
60003700019882 A
60003700019822 B
60003700019787 C
Desired Outcome
NHDPlusID Spec. StreamLeve StreamOrde StreamCalc
60003700019787 A 5 1 1
60003700019816 B 5 2 2
60003700019882 A 5 1 1
60003700019822 C 5 1 1
60003700019787 C 5 1 1
60003700019844 B 4 1 1
60003700019882 A 5 1 1
60003700019822 B 5 1 1
60003700019787 C 5 1 1
My initial code was to use
DF3<-dplyr::left_join(DF2,DF1, by="NHDPlusID")
But using this, everything from DF2 shows up and the columns from DF1 attached to the new DF3 dataframe, but all the values in the attached columns turn to NA.
NHDPlusID Spec. StreamLeve StreamOrde StreamCalc
60003700019787 A NA NA NA
60003700019816 B NA NA NA
60003700019882 A NA NA NA
60003700019822 C NA NA NA
60003700019787 C NA NA NA
60003700019844 B NA NA NA
60003700019882 A NA NA NA
60003700019822 B NA NA NA
60003700019787 C NA NA NA
CodePudding user response:
Your data:
DF1 <- data.frame(Site = c("A", "B", "C"),
Feature1 = c("XX", "XY", "ZZ"),
Feature2 = c("YY", "YX", "WW"))
DF2 <- data.frame(Individual = c(1,2,3,4,5,6,7,8,9),
Site = c("A", "B", "A", "C", "C", "B", "A", "B", "C"))
You can use the following code:
library(dplyr)
df_desired <- left_join(DF2, DF1, by = "Site")
Output:
Individual Site Feature1 Feature2
1 1 A XX YY
2 2 B XY YX
3 3 A XX YY
4 4 C ZZ WW
5 5 C ZZ WW
6 6 B XY YX
7 7 A XX YY
8 8 B XY YX
9 9 C ZZ WW
CodePudding user response:
Using base R:
df1 <- data.frame(
Site = c("A", "B", "C"),
Feature1 = c("XX", "XY", "ZZ"),
Feature2 = c("YY", "YX", "WW")
)
df2 <- data.frame(
Individual = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
Site = c("A", "B", "A", "C", "C", "B", "A", "B", "C")
)
merge(df2, df1, by = "Site")[c(2,1,3:4)]
#> Individual Site Feature1 Feature2
#> 1 1 A XX YY
#> 2 3 A XX YY
#> 3 7 A XX YY
#> 4 2 B XY YX
#> 5 8 B XY YX
#> 6 6 B XY YX
#> 7 5 C ZZ WW
#> 8 4 C ZZ WW
#> 9 9 C ZZ WW
CodePudding user response:
using your data for df1 and df2 copy paste :
df1 <- read_table('Site Feature1 Feature2
A XX YY
B XY YX
C ZZ WW ')
df2 <- read_table('Individual Site
1 A
2 B
3 A
4 C
5 C
6 B
7 A
8 B
9 C')
Now to get same output as yours: simply left join- df2 with df1 and not other way- by "site" and reorder the columns. Now if you assign the output to DF2 it will be saved!
df2 <- dplyr::left_join(x=df2,y=df1,by=c('Site'))%>%select(Individual,everything())
now following the steps; df2 has become :
df2
> df2
# A tibble: 9 x 4
Individual Site Feature1 Feature2
<dbl> <chr> <chr> <chr>
1 1 A XX YY
2 3 A XX YY
3 7 A XX YY
4 2 B XY YX
5 6 B XY YX
6 8 B XY YX
7 4 C ZZ WW
8 5 C ZZ WW
9 9 C ZZ WW
What is the error you are getting? Please ensure to use <-