`
I'm starting with a variable with chr strings.
df$Specs : chr [1:28752] "4 GB RAM | 64 GB ROM | ExpandableUpto256GB", "..."
My goal is to create 3 new variables called "RAM", "ROM", "ExpandableUpto" with the corresponding strings as row observations "xGBRAM", "xGBROM", "xExpandableUpto". Then I just remove the chr strings and be left with numbers as characters. Then I will convert them to numbers and transform them all to GB units.
Here's where I'm at currently.
df$RAM : chr [1:28752] "4GBRAM", "..."
df$ROM : chr [1:28752] "64GBROM", "..."
df$ExpandableUpto" : chr [1:28752] "ExpandableUpto256GB", "..."
I can get the chr strings into new variables "RAM" "ROM" and "ExpandableUpto" but since not all of the vectors have 3 sets of strings (some have 1 and 2), the strings fill the variables 1 at a time starting with "RAM". That means that some of my rows have "4GBROM" in the "RAM" variable. Is there a way to get only "RAM" strings in the "RAM" variable. etc?
What I started with:
remove whitespace in "Specs"
Mobile_phones7 <- Mobile_phones6 %>% mutate(Specs = stringr::str_remove_all(Specs, "\\s "))
remove "|" from chr strings from "Specs"
Mobile_phones8 <- Mobile_phones7 %>% mutate(Specs = stringr::str_split(Specs, coll("|")))
split the character strings in "Specs" and place them in a list
of chr vectors of [1:3] strings.
Mobile_phones9 <- Mobile_phones8 %>% rowwise() %>% mutate(Specs = Reduce(paste, Specs))
separate Specs list vectors into 3 new variables separated by whitespace
Mobile_phones10 <- Mobile_phones9 %>% separate(Specs, c("RAM", "ROM", "ExpandableUpto"), sep = "\\s ")
This resulted in:
Thanks Ben
CodePudding user response:
Please check the below code
code
library(tidyverse)
library(stringr)
data.frame(spec="4 GB RAM | 64 GB ROM | ExpandableUpto256GB") %>%
tidyr::extract(col=spec, into = c('RAM', 'ROM', 'ExpandableUpto'), regex = '(.*)\\|(.*)\\|(.*)', remove = T) %>%
mutate(across(c(RAM,ROM,ExpandableUpto), ~ str_remove_all(.x,'\\s')))
Created on 2023-01-20 with reprex v2.0.2
output
RAM ROM ExpandableUpto
1 4GBRAM 64GBROM ExpandableUpto256GB
CodePudding user response:
Please check updated code as per your comments
data & code
data.frame(spec=c("4 GB RAM | 64 GB ROM | ExpandableUpto256GB",
"6 GB RAM | 128 GB ROM",
"128 GB ROM",
"0 MB ROM | Expandable Upto 32 GB")) %>%
transmute(RAM=as.numeric(str_extract(spec, '\\d (?=\\s\\w \\sRAM)')),
ROM=as.numeric(str_extract(spec, '\\d (?=\\s\\w \\sROM)')),
ExpandableUpto=as.numeric(str_extract(spec, '(?<=[Upto|\\s])\\d '))
)
Created on 2023-01-22 with reprex v2.0.2
output
RAM ROM ExpandableUpto
1 4 64 64
2 6 128 128
3 NA 128 NA
4 NA 0 32