Home > front end >  Match two tables's columns when the elements are separated by slash
Match two tables's columns when the elements are separated by slash

Time:11-06

I have a data frame that looks like this:

Names Identification
Animals 15/20/25/26
Fruits 1/2/3/4

And another data frame that looks like this:

Id Identification
Cat 15
Dog 20
Elephant 25
Mouse 26
Banana 1
Melon 2
Mango 3
Apple 4

I want to match the identification codes from the first table, to the second to create a new column in the original table:

Names Identification Id
Animals 15/20/25/26 Cat/Dog/Elephant/Mouse
Fruits 1/2/3/4 Banana/Melon/Mango/Apple

This is the code for the tables:


original <- data.frame(
    Names = c('Animals', 'Fruits'),
    Identification = c('15/20/25/26', '1/2/3/4')
)


to_match <- data.frame(
    Id = c('Cat', 'Dog', 'Elephant', 'Mouse','Banana', 'Melon', 'Mango','Apple'),
    Identification = c(15,20,25,26,1,2,3,4)
)

This is what I tried, which does not work due to the slashes.

original$Id <- to_match$Id[match(original$Identifcation, to_match$Identification),]
    

Any help would be appreciated, thanks.

CodePudding user response:

You can use separate_rows and then group again:

library(tidyverse)

a <- read.table(text = "Names   Identification
Animals 15/20/25/26
Fruits  1/2/3/4", header = T)

b <- read.table(text = "Id  Identification
Cat 15
Dog 20
Elephant    25
Mouse   26
Banana  1
Melon   2
Mango   3
Apple   4", header = T)

a %>%
  separate_rows(Identification) %>%
  left_join(b %>%
              mutate_all(as.character), by = "Identification") %>%
  group_by(Names) %>%
  summarise(Identification = str_c(Identification, collapse = "/"),
            Id = str_c(Id, collapse =  "/"), 
            .groups = "drop")

# # A tibble: 2 x 3
# Names   Identification Id                      
# <chr>   <chr>          <chr>                   
#   1 Animals 15/20/25/26    Cat/Dog/Elephant/Mouse  
# 2 Fruits  1/2/3/4        Banana/Melon/Mango/Apple

CodePudding user response:

stringi has a function that allows you to do exactly this:

original$Id <- stri_replace_all_fixed(original$Identification, to_match$Identification, to_match$Id, vectorize_all = F)
  • Related