I have two datasets. One contains internal data that needs to be reformatted for an internal system
Internal Name 1 Internal Name 2 Internal Name 3
1234 abcd testing
4567 efgh testing
78910 ijkl 123
The second dataset is empty with just column names. This is format template
Customer Name Reference Number
Basically I need to copy data from the internal data into the format template. Some columns are not needed and the order of the columns are different so the easiest way is to specify "Internal Name 1" goes to "Reference Number"
Customer Name Reference Number
abcd 1234
efgh 4567
ijkl 78910
I could just rename the columns and remove the columns not needed but how the columns are named and the order they are in are important for the internal system so I feel it would be easier to somehow copy the data into the format dataset
I have tried doing a simple data$Customer Name
= internaldata$Internal Name 1
but I get an error for the number of Rows not matching up. The number for rows received every week will be different so I can't just set this to a specific value
CodePudding user response:
I suppose if you have an empty template data frame like this:
template <- data.frame('Customer Name' = character(),
'Reference Number' = character(),
check.names = FALSE)
And a reproducible example of your data frame in the question:
dat <- data.frame(`Internal Name 1` = c("1234", "4567", "78910"),
`Internal Name 2` = c("abcd", "efgh", "ijkl"),
`Internal Name 3` = c("testing", "testing", "123"),
check.names = FALSE)
Then you could do the following:
library(dplyr)
full_join(template, dat,
by = c('Customer Name' = 'Internal Name 2',
'Reference Number' = 'Internal Name 1')) %>%
select(`Customer Name`, `Reference Number`)
#> Customer Name Reference Number
#> 1 abcd 1234
#> 2 efgh 4567
#> 3 ijkl 78910
If you store the result as template
, then running the same code with new dat
will add any new customer info to the existing data.
CodePudding user response:
You may match
received customers with internal name 2, which gives you the row number in internal data frame. Actually there is no need for an empty template. If the new name is not internally recorded you get NA
.
## customers in arbitrary number and order
received <- c("ijkl", "abcd", "efgh", "ijkl", "abcd", "efgh", 'nomatch')
newdat <- data.frame(Customer.Name=received,
Reference.Number=with(intern, Internal.Name.1[match(received, Internal.Name.2)]))
newdat
# Customer.Name Reference.Number
# 1 ijkl 78910
# 2 abcd 1234
# 3 efgh 4567
# 4 ijkl 78910
# 5 abcd 1234
# 6 efgh 4567
# 7 nomatch NA