I am trying to cut a column values(sample_col2) into two columns(Brand, Model) by using a Reference column(sample_col) values for the separation.
library(tidyverse)
sample_df <- data.frame("sample_col" = c("Audi","BMW","MG","Hyundai","Kia","Maruti Suzuki"),
"value" = c(22,44,66,88,19,11),
"sample_col2" = c("Audi TT 2998","BMW X6 3355","MG HECTOR 2000","Hyundai CRETA 1500","Kia SELTOS 1500","Maruti Suzuki vitara brezza 1200"))
sample_df
sample_col value sample_col2
1 Audi 22 Audi TT 2998
2 BMW 44 BMW X6 3355
3 MG 66 MG HECTOR 2000
4 Hyundai 88 Hyundai CRETA 1500
5 Kia 19 Kia SELTOS 1500
6 Maruti Suzuki 11 Maruti Suzuki vitara brezza 1200
Results I need:
I have tried in below 2 different ways but none worked:
1st
sample_df %>%
separate(col = sample_col2, into = c("Brand","Model"), sep = sample_df$sample_col)
2nd
sample_list <- str_c(sample_df$sample_col, collapse = "|")
sample_list
# output
# [1] "Audi|BMW|MG|Hyundai|Kia|Maruti Suzuki"
sample_df %>%
separate(col = sample_col2, into = c("Brand","Model"), sep = sample_list)
CodePudding user response:
Slightly simpler version:
sample_df %>% transmute(brand = sample_col, model = str_remove(sample_col2, sample_col))
#> brand model
#> 1 Audi TT 2998
#> 2 BMW X6 3355
#> 3 MG HECTOR 2000
#> 4 Hyundai CRETA 1500
#> 5 Kia SELTOS 1500
#> 6 Maruti Suzuki vitara brezza 1200
Created on 2022-04-02 by the reprex package (v2.0.1)
CodePudding user response:
This gives the expected output; does it solve your problem?
library(tidyverse)
sample_df <- data.frame("sample_col" = c("Audi","BMW","MG","Hyundai","Kia","Maruti Suzuki"),
"value" = c(22,44,66,88,19,11),
"sample_col2" = c("Audi TT 2998","BMW X6 3355","MG HECTOR 2000","Hyundai CRETA 1500","Kia SELTOS 1500","Maruti Suzuki vitara brezza 1200"))
brands <- paste0(sample_df$sample_col, collapse = "|")
sample_df %>%
mutate(model = str_remove(sample_col2, brands)) %>%
rename("brand" = sample_col) %>%
select(brand, model)
#> brand model
#> 1 Audi TT 2998
#> 2 BMW X6 3355
#> 3 MG HECTOR 2000
#> 4 Hyundai CRETA 1500
#> 5 Kia SELTOS 1500
#> 6 Maruti Suzuki vitara brezza 1200
Created on 2022-04-02 by the reprex package (v2.0.1)
CodePudding user response:
In base R we may Vectorize
gsub
. We should also use trimws
to delete the whitespace left over from removing model from brand.
transform(sample_df[-2],
sample_col2=trimws(Vectorize(gsub)(sample_col, '', sample_col2))) |>
setNames(c('Model', 'Brand'))
# Model Brand
# 1 Audi TT 2998
# 2 BMW X6 3355
# 3 MG HECTOR 2000
# 4 Hyundai CRETA 1500
# 5 Kia SELTOS 1500
# 6 Maruti Suzuki vitara brezza 1200
Data:
sample_df <- structure(list(sample_col = c("Audi", "BMW", "MG", "Hyundai",
"Kia", "Maruti Suzuki"), value = c(22, 44, 66, 88, 19, 11), sample_col2 = c("Audi TT 2998",
"BMW X6 3355", "MG HECTOR 2000", "Hyundai CRETA 1500", "Kia SELTOS 1500",
"Maruti Suzuki vitara brezza 1200")), class = "data.frame", row.names = c(NA,
-6L))