Is there a way in R to merge data dataframes that have repeated keys?-CodePudding

I'm trying to merge two data frames together in R. One is a list of unique places and their accompanying attributes that all have a unique identifier for those places (Dataframe 1). The other is a list of species that come from a smaller subset of those places (dataframe 2). In DF2, there's lots of species that come from the same place (50 species from ~100 places). I'm trying to merge DF1 and DF2 so that each species has the appropriate location information assigned to it.

The columns in DF1 are a mix of Factors (1st col) and numbers

DF1

NHDPlusID      StreamLeve   StreamOrde  StreamCalc  
60003700019787  5             1          1  
60003700019816  5             2          2  
60003700019822  5             1          1  
60003700019844  4             1          1  
60003700019882  5             1          1

DF2 is also a mix of Factors, numbers, and characters

DF2

NHDPlusID      Spec.
60003700019787  A
60003700019816  B
60003700019882  A
60003700019822  C
60003700019787  C
60003700019844  B
60003700019882  A
60003700019822  B
60003700019787  C

Desired Outcome

NHDPlusID      Spec. StreamLeve StreamOrde StreamCalc
60003700019787  A    5          1          1
60003700019816  B    5          2          2    
60003700019882  A    5          1          1
60003700019822  C    5          1          1    
60003700019787  C    5          1          1
60003700019844  B    4          1          1
60003700019882  A    5          1          1
60003700019822  B    5          1          1
60003700019787  C    5          1          1

My initial code was to use

DF3<-dplyr::left_join(DF2,DF1, by="NHDPlusID")

But using this, everything from DF2 shows up and the columns from DF1 attached to the new DF3 dataframe, but all the values in the attached columns turn to NA.

NHDPlusID      Spec. StreamLeve StreamOrde StreamCalc
60003700019787  A    NA          NA          NA
60003700019816  B    NA          NA          NA
60003700019882  A    NA          NA          NA
60003700019822  C    NA          NA          NA
60003700019787  C    NA          NA          NA
60003700019844  B    NA          NA          NA
60003700019882  A    NA          NA          NA
60003700019822  B    NA          NA          NA
60003700019787  C    NA          NA          NA

CodePudding user response：

Your data:

DF1 <- data.frame(Site = c("A", "B", "C"),
                  Feature1 = c("XX", "XY", "ZZ"),
                  Feature2 = c("YY", "YX", "WW"))

DF2 <- data.frame(Individual = c(1,2,3,4,5,6,7,8,9),
                  Site = c("A", "B", "A", "C", "C", "B", "A", "B", "C"))

You can use the following code:

library(dplyr)

df_desired <- left_join(DF2, DF1, by = "Site")

Output:

  Individual Site Feature1 Feature2
1          1    A       XX       YY
2          2    B       XY       YX
3          3    A       XX       YY
4          4    C       ZZ       WW
5          5    C       ZZ       WW
6          6    B       XY       YX
7          7    A       XX       YY
8          8    B       XY       YX
9          9    C       ZZ       WW

CodePudding user response：

Using base R:

df1 <- data.frame(
  Site = c("A", "B", "C"),
  Feature1 = c("XX", "XY", "ZZ"),
  Feature2 = c("YY", "YX", "WW")
)

df2 <- data.frame(
  Individual = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
  Site = c("A", "B", "A", "C", "C", "B", "A", "B", "C")
)
merge(df2, df1, by = "Site")[c(2,1,3:4)]

#>   Individual Site Feature1 Feature2
#> 1          1    A       XX       YY
#> 2          3    A       XX       YY
#> 3          7    A       XX       YY
#> 4          2    B       XY       YX
#> 5          8    B       XY       YX
#> 6          6    B       XY       YX
#> 7          5    C       ZZ       WW
#> 8          4    C       ZZ       WW
#> 9          9    C       ZZ       WW

CodePudding user response：

using your data for df1 and df2 copy paste :

df1 <- read_table('Site Feature1 Feature2
A     XX      YY
B     XY      YX
C     ZZ      WW ')

df2 <- read_table('Individual Site
1          A
2          B
3          A
4          C
5          C
6          B
7          A
8          B
9          C')

Now to get same output as yours: simply left join- df2 with df1 and not other way- by "site" and reorder the columns. Now if you assign the output to DF2 it will be saved!

df2 <- dplyr::left_join(x=df2,y=df1,by=c('Site'))%>%select(Individual,everything())

now following the steps; df2 has become :

df2
> df2
# A tibble: 9 x 4
  Individual Site  Feature1 Feature2
       <dbl> <chr> <chr>    <chr>   
1          1 A     XX       YY      
2          3 A     XX       YY      
3          7 A     XX       YY      
4          2 B     XY       YX      
5          6 B     XY       YX      
6          8 B     XY       YX      
7          4 C     ZZ       WW      
8          5 C     ZZ       WW      
9          9 C     ZZ       WW

What is the error you are getting? Please ensure to use <-