I have two datasets of sales data at the end of month. I am just trying to find the matching rows from the dataset A and pull the shipped value from the dataset B. I have tried using merge and match but the values don't line up or the dataset explodes. I made a small example the real dataset has over 100 columns and around 500,00 rows but I didn't feel they were relevant.
Data set A
First Name | Last Name | Order | Shipped |
---|---|---|---|
John | S | 300 | Y |
Tim | B | 331 | Y |
Kathy | J | 365 | N |
Clayton | S | 362 | Y |
Ashley | R | 364 | N |
John | A | 321 | N |
John | S | 388 | Y |
Ashley | R | 338 | N |
Dataset B
First Name | Last Name | Order | Shipped |
---|---|---|---|
John | S | 300 | Y |
Tim | B | 331 | N |
Kathy | J | 365 | N |
Clayton | S | 362 | Y |
Ashley | R | 364 | Y |
John | A | 321 | Y |
Jake | K | 333 | N |
Bobby | J | 398 | N |
Desired output
First Name | Last Name | Order | Shipped A | Shipped B |
---|---|---|---|---|
John | S | 300 | Y | Y |
Tim | B | 331 | Y | N |
Kathy | J | 365 | N | N |
Clayton | S | 362 | Y | Y |
Ashley | R | 364 | N | Y |
John | A | 321 | N | Y |
John | S | 388 | Y | N/A |
Ashley | R | 338 | N | N/A |
CodePudding user response:
A base R solution using merge
merge(df1, df2, c("First_Name", "Last_Name", "Order"),
suffixes = c(" A"," B"), all.x = T)
First_Name Last_Name Order Shipped A Shipped B
1 Ashley R 338 N <NA>
2 Ashley R 364 N Y
3 Clayton S 362 Y Y
4 John A 321 N Y
5 John S 300 Y Y
6 John S 388 Y <NA>
7 Kathy J 365 N N
8 Tim B 331 Y N
Data
df1 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "John", "Ashley"), Last_Name = c("S", "B",
"J", "S", "R", "A", "S", "R"), Order = c(300L, 331L, 365L, 362L,
364L, 321L, 388L, 338L), Shipped = c("Y", "Y", "N", "Y", "N",
"N", "Y", "N")), class = "data.frame", row.names = c(NA, -8L))
df2 <- structure(list(First_Name = c("John", "Tim", "Kathy", "Clayton",
"Ashley", "John", "Jake", "Bobby"), Last_Name = c("S", "B", "J",
"S", "R", "A", "K", "J"), Order = c(300L, 331L, 365L, 362L, 364L,
321L, 333L, 398L), Shipped = c("Y", "N", "N", "Y", "Y", "Y",
"N", "N")), class = "data.frame", row.names = c(NA, -8L))
CodePudding user response:
It's hard to tell what went wrong without seeing your code. However, this line does the trick?
library(tidyverse)
# define data
dfa <- tibble::tribble(
~First.Name, ~Last.Name, ~Order, ~Shipped,
"John", "S", 300L, "Y",
"Tim", "B", 331L, "Y",
"Kathy", "J", 365L, "N",
"Clayton", "S", 362L, "Y",
"Ashley", "R", 364L, "N",
"John", "A", 321L, "N",
"John", "S", 388L, "Y",
"Ashley", "R", 338L, "N"
)
dfb <- tibble::tribble(
~First.Name, ~Last.Name, ~Order, ~Shipped,
"John", "S", 300L, "Y",
"Tim", "B", 331L, "N",
"Kathy", "J", 365L, "N",
"Clayton", "S", 362L, "Y",
"Ashley", "R", 364L, "Y",
"John", "A", 321L, "Y",
"Jake", "K", 333L, "N",
"Bobby", "J", 398L, "N"
)
# merge
left_join(dfa, dfb, by = c("First.Name", "Last.Name", "Order"))
#> # A tibble: 8 × 5
#> First.Name Last.Name Order Shipped.x Shipped.y
#> <chr> <chr> <int> <chr> <chr>
#> 1 John S 300 Y Y
#> 2 Tim B 331 Y N
#> 3 Kathy J 365 N N
#> 4 Clayton S 362 Y Y
#> 5 Ashley R 364 N Y
#> 6 John A 321 N Y
#> 7 John S 388 Y <NA>
#> 8 Ashley R 338 N <NA>
Created on 2022-11-17 with reprex v2.0.2