Home > other >  Find multiple matching columns and pull value from another dataset in R
Find multiple matching columns and pull value from another dataset in R

Time:11-18

I have two datasets of sales data at the end of month. I am just trying to find the matching rows from the dataset A and pull the shipped value from the dataset B. I have tried using merge and match but the values don't line up or the dataset explodes. I made a small example the real dataset has over 100 columns and around 500,00 rows but I didn't feel they were relevant.

Data set A

First Name Last Name Order Shipped
John S 300 Y
Tim B 331 Y
Kathy J 365 N
Clayton S 362 Y
Ashley R 364 N
John A 321 N
John S 388 Y
Ashley R 338 N

Dataset B

First Name Last Name Order Shipped
John S 300 Y
Tim B 331 N
Kathy J 365 N
Clayton S 362 Y
Ashley R 364 Y
John A 321 Y
Jake K 333 N
Bobby J 398 N

Desired output

First Name Last Name Order Shipped A Shipped B
John S 300 Y Y
Tim B 331 Y N
Kathy J 365 N N
Clayton S 362 Y Y
Ashley R 364 N Y
John A 321 N Y
John S 388 Y N/A
Ashley R 338 N N/A

CodePudding user response:

A base R solution using merge

merge(df1, df2, c("First_Name", "Last_Name", "Order"), 
        suffixes = c(" A"," B"), all.x = T)
  First_Name Last_Name Order Shipped A Shipped B
1     Ashley         R   338         N      <NA>
2     Ashley         R   364         N         Y
3    Clayton         S   362         Y         Y
4       John         A   321         N         Y
5       John         S   300         Y         Y
6       John         S   388         Y      <NA>
7      Kathy         J   365         N         N
8        Tim         B   331         Y         N

Data

df1 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "John", "Ashley"), Last_Name = c("S", "B",
"J", "S", "R", "A", "S", "R"), Order = c(300L, 331L, 365L, 362L,
364L, 321L, 388L, 338L), Shipped = c("Y", "Y", "N", "Y", "N",
"N", "Y", "N")), class = "data.frame", row.names = c(NA, -8L))

df2 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "Jake", "Bobby"), Last_Name = c("S", "B", "J",
"S", "R", "A", "K", "J"), Order = c(300L, 331L, 365L, 362L, 364L,
321L, 333L, 398L), Shipped = c("Y", "N", "N", "Y", "Y", "Y",
"N", "N")), class = "data.frame", row.names = c(NA, -8L))

CodePudding user response:

It's hard to tell what went wrong without seeing your code. However, this line does the trick?

library(tidyverse)

# define data
dfa <- tibble::tribble(
  ~First.Name, ~Last.Name, ~Order, ~Shipped,
       "John",        "S",   300L,      "Y",
        "Tim",        "B",   331L,      "Y",
      "Kathy",        "J",   365L,      "N",
    "Clayton",        "S",   362L,      "Y",
     "Ashley",        "R",   364L,      "N",
       "John",        "A",   321L,      "N",
       "John",        "S",   388L,      "Y",
     "Ashley",        "R",   338L,      "N"
  )

dfb <- tibble::tribble(
         ~First.Name, ~Last.Name, ~Order, ~Shipped,
              "John",        "S",   300L,      "Y",
               "Tim",        "B",   331L,      "N",
             "Kathy",        "J",   365L,      "N",
           "Clayton",        "S",   362L,      "Y",
            "Ashley",        "R",   364L,      "Y",
              "John",        "A",   321L,      "Y",
              "Jake",        "K",   333L,      "N",
             "Bobby",        "J",   398L,      "N"
  )

# merge 
left_join(dfa, dfb, by = c("First.Name", "Last.Name", "Order"))
#> # A tibble: 8 × 5
#>   First.Name Last.Name Order Shipped.x Shipped.y
#>   <chr>      <chr>     <int> <chr>     <chr>    
#> 1 John       S           300 Y         Y        
#> 2 Tim        B           331 Y         N        
#> 3 Kathy      J           365 N         N        
#> 4 Clayton    S           362 Y         Y        
#> 5 Ashley     R           364 N         Y        
#> 6 John       A           321 N         Y        
#> 7 John       S           388 Y         <NA>     
#> 8 Ashley     R           338 N         <NA>

Created on 2022-11-17 with reprex v2.0.2

  •  Tags:  
  • r
  • Related