Home > database >  How to Separate a column of unequal length into two parts using vector as reference in r?
How to Separate a column of unequal length into two parts using vector as reference in r?

Time:04-02

I am trying to cut a column values(sample_col2) into two columns(Brand, Model) by using a Reference column(sample_col) values for the separation.

library(tidyverse)

sample_df <- data.frame("sample_col" = c("Audi","BMW","MG","Hyundai","Kia","Maruti Suzuki"),
                        "value" = c(22,44,66,88,19,11),
                        "sample_col2" = c("Audi TT 2998","BMW X6 3355","MG HECTOR 2000","Hyundai CRETA 1500","Kia SELTOS 1500","Maruti Suzuki vitara brezza 1200"))  

sample_df
     sample_col value                      sample_col2
1          Audi    22                     Audi TT 2998
2           BMW    44                      BMW X6 3355
3            MG    66                   MG HECTOR 2000
4       Hyundai    88               Hyundai CRETA 1500
5           Kia    19                  Kia SELTOS 1500
6 Maruti Suzuki    11 Maruti Suzuki vitara brezza 1200

Results I need:

enter image description here

I have tried in below 2 different ways but none worked:

1st

sample_df %>% 
  separate(col = sample_col2, into = c("Brand","Model"), sep = sample_df$sample_col)

2nd

sample_list <- str_c(sample_df$sample_col, collapse =  "|")
sample_list

# output
# [1] "Audi|BMW|MG|Hyundai|Kia|Maruti Suzuki"
sample_df %>% 
  separate(col = sample_col2, into = c("Brand","Model"), sep = sample_list)

CodePudding user response:

Slightly simpler version:

sample_df %>% transmute(brand = sample_col, model = str_remove(sample_col2, sample_col))
#>           brand               model
#> 1          Audi             TT 2998
#> 2           BMW             X6 3355
#> 3            MG         HECTOR 2000
#> 4       Hyundai          CRETA 1500
#> 5           Kia         SELTOS 1500
#> 6 Maruti Suzuki  vitara brezza 1200

Created on 2022-04-02 by the reprex package (v2.0.1)

CodePudding user response:

This gives the expected output; does it solve your problem?

library(tidyverse)

sample_df <- data.frame("sample_col" = c("Audi","BMW","MG","Hyundai","Kia","Maruti Suzuki"),
                        "value" = c(22,44,66,88,19,11),
                        "sample_col2" = c("Audi TT 2998","BMW X6 3355","MG HECTOR 2000","Hyundai CRETA 1500","Kia SELTOS 1500","Maruti Suzuki vitara brezza 1200"))

brands <- paste0(sample_df$sample_col, collapse = "|")

sample_df %>%
  mutate(model = str_remove(sample_col2, brands)) %>%
  rename("brand" = sample_col) %>%
  select(brand, model)
#>           brand               model
#> 1          Audi             TT 2998
#> 2           BMW             X6 3355
#> 3            MG         HECTOR 2000
#> 4       Hyundai          CRETA 1500
#> 5           Kia         SELTOS 1500
#> 6 Maruti Suzuki  vitara brezza 1200

Created on 2022-04-02 by the reprex package (v2.0.1)

CodePudding user response:

In base R we may Vectorize gsub. We should also use trimws to delete the whitespace left over from removing model from brand.

transform(sample_df[-2], 
          sample_col2=trimws(Vectorize(gsub)(sample_col, '', sample_col2))) |>
  setNames(c('Model', 'Brand'))
#           Model               Brand
# 1          Audi             TT 2998
# 2           BMW             X6 3355
# 3            MG         HECTOR 2000
# 4       Hyundai          CRETA 1500
# 5           Kia         SELTOS 1500
# 6 Maruti Suzuki  vitara brezza 1200

Data:

sample_df <- structure(list(sample_col = c("Audi", "BMW", "MG", "Hyundai", 
"Kia", "Maruti Suzuki"), value = c(22, 44, 66, 88, 19, 11), sample_col2 = c("Audi TT 2998", 
"BMW X6 3355", "MG HECTOR 2000", "Hyundai CRETA 1500", "Kia SELTOS 1500", 
"Maruti Suzuki vitara brezza 1200")), class = "data.frame", row.names = c(NA, 
-6L))
  • Related