Home > Enterprise >  Reformat dataset in R into a different empty dataset
Reformat dataset in R into a different empty dataset

Time:08-31

I have two datasets. One contains internal data that needs to be reformatted for an internal system

Internal Name 1  Internal Name 2 Internal Name 3
   1234            abcd            testing
   4567            efgh            testing 
   78910           ijkl            123

The second dataset is empty with just column names. This is format template

Customer Name    Reference Number

Basically I need to copy data from the internal data into the format template. Some columns are not needed and the order of the columns are different so the easiest way is to specify "Internal Name 1" goes to "Reference Number"

Customer Name     Reference Number
abcd               1234
efgh               4567
ijkl               78910   

I could just rename the columns and remove the columns not needed but how the columns are named and the order they are in are important for the internal system so I feel it would be easier to somehow copy the data into the format dataset

I have tried doing a simple data$Customer Name = internaldata$Internal Name 1 but I get an error for the number of Rows not matching up. The number for rows received every week will be different so I can't just set this to a specific value

CodePudding user response:

I suppose if you have an empty template data frame like this:

template <- data.frame('Customer Name' = character(), 
                       'Reference Number' = character(),
                       check.names = FALSE)

And a reproducible example of your data frame in the question:

dat <- data.frame(`Internal Name 1` = c("1234", "4567", "78910"), 
                  `Internal Name 2` = c("abcd", "efgh", "ijkl"), 
                  `Internal Name 3` = c("testing", "testing", "123"),
                  check.names = FALSE)

Then you could do the following:

library(dplyr)

full_join(template, dat, 
          by = c('Customer Name' = 'Internal Name 2',
                 'Reference Number' = 'Internal Name 1')) %>%
  select(`Customer Name`, `Reference Number`)
#>   Customer Name Reference Number
#> 1          abcd             1234
#> 2          efgh             4567
#> 3          ijkl            78910

If you store the result as template, then running the same code with new dat will add any new customer info to the existing data.

CodePudding user response:

You may match received customers with internal name 2, which gives you the row number in internal data frame. Actually there is no need for an empty template. If the new name is not internally recorded you get NA.

## customers in arbitrary number and order
received <- c("ijkl", "abcd", "efgh", "ijkl", "abcd", "efgh", 'nomatch')  

newdat <- data.frame(Customer.Name=received, 
                     Reference.Number=with(intern, Internal.Name.1[match(received, Internal.Name.2)]))
newdat
#   Customer.Name Reference.Number
# 1          ijkl            78910
# 2          abcd             1234
# 3          efgh             4567
# 4          ijkl            78910
# 5          abcd             1234
# 6          efgh             4567
# 7       nomatch               NA
  • Related