Find multiple matching columns and pull value from another dataset in R-CodePudding

I have two datasets of sales data at the end of month. I am just trying to find the matching rows from the dataset A and pull the shipped value from the dataset B. I have tried using merge and match but the values don't line up or the dataset explodes. I made a small example the real dataset has over 100 columns and around 500,00 rows but I didn't feel they were relevant.

Data set A

First Name	Last Name	Order	Shipped
John	S	300	Y
Tim	B	331	Y
Kathy	J	365	N
Clayton	S	362	Y
Ashley	R	364	N
John	A	321	N
John	S	388	Y
Ashley	R	338	N

Dataset B

First Name	Last Name	Order	Shipped
John	S	300	Y
Tim	B	331	N
Kathy	J	365	N
Clayton	S	362	Y
Ashley	R	364	Y
John	A	321	Y
Jake	K	333	N
Bobby	J	398	N

Desired output

First Name	Last Name	Order	Shipped A	Shipped B
John	S	300	Y	Y
Tim	B	331	Y	N
Kathy	J	365	N	N
Clayton	S	362	Y	Y
Ashley	R	364	N	Y
John	A	321	N	Y
John	S	388	Y	N/A
Ashley	R	338	N	N/A

CodePudding user response：

A base R solution using merge

merge(df1, df2, c("First_Name", "Last_Name", "Order"), 
        suffixes = c(" A"," B"), all.x = T)
  First_Name Last_Name Order Shipped A Shipped B
1     Ashley         R   338         N      <NA>
2     Ashley         R   364         N         Y
3    Clayton         S   362         Y         Y
4       John         A   321         N         Y
5       John         S   300         Y         Y
6       John         S   388         Y      <NA>
7      Kathy         J   365         N         N
8        Tim         B   331         Y         N

Data

df1 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "John", "Ashley"), Last_Name = c("S", "B",
"J", "S", "R", "A", "S", "R"), Order = c(300L, 331L, 365L, 362L,
364L, 321L, 388L, 338L), Shipped = c("Y", "Y", "N", "Y", "N",
"N", "Y", "N")), class = "data.frame", row.names = c(NA, -8L))

df2 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "Jake", "Bobby"), Last_Name = c("S", "B", "J",
"S", "R", "A", "K", "J"), Order = c(300L, 331L, 365L, 362L, 364L,
321L, 333L, 398L), Shipped = c("Y", "N", "N", "Y", "Y", "Y",
"N", "N")), class = "data.frame", row.names = c(NA, -8L))

CodePudding user response：

It's hard to tell what went wrong without seeing your code. However, this line does the trick?

library(tidyverse)

# define data
dfa <- tibble::tribble(
  ~First.Name, ~Last.Name, ~Order, ~Shipped,
       "John",        "S",   300L,      "Y",
        "Tim",        "B",   331L,      "Y",
      "Kathy",        "J",   365L,      "N",
    "Clayton",        "S",   362L,      "Y",
     "Ashley",        "R",   364L,      "N",
       "John",        "A",   321L,      "N",
       "John",        "S",   388L,      "Y",
     "Ashley",        "R",   338L,      "N"
  )

dfb <- tibble::tribble(
         ~First.Name, ~Last.Name, ~Order, ~Shipped,
              "John",        "S",   300L,      "Y",
               "Tim",        "B",   331L,      "N",
             "Kathy",        "J",   365L,      "N",
           "Clayton",        "S",   362L,      "Y",
            "Ashley",        "R",   364L,      "Y",
              "John",        "A",   321L,      "Y",
              "Jake",        "K",   333L,      "N",
             "Bobby",        "J",   398L,      "N"
  )

# merge 
left_join(dfa, dfb, by = c("First.Name", "Last.Name", "Order"))
#> # A tibble: 8 × 5
#>   First.Name Last.Name Order Shipped.x Shipped.y
#>   <chr>      <chr>     <int> <chr>     <chr>    
#> 1 John       S           300 Y         Y        
#> 2 Tim        B           331 Y         N        
#> 3 Kathy      J           365 N         N        
#> 4 Clayton    S           362 Y         Y        
#> 5 Ashley     R           364 N         Y        
#> 6 John       A           321 N         Y        
#> 7 John       S           388 Y         <NA>     
#> 8 Ashley     R           338 N         <NA>

^{Created on 2022-11-17 with reprex v2.0.2}