Home > OS >  How do I separate strings in a variable and place them into new variables?
How do I separate strings in a variable and place them into new variables?

Time:01-23

`

I'm starting with a variable with chr strings.

df$Specs : chr [1:28752] "4 GB RAM | 64 GB ROM | ExpandableUpto256GB", "..."

enter image description here

My goal is to create 3 new variables called "RAM", "ROM", "ExpandableUpto" with the corresponding strings as row observations "xGBRAM", "xGBROM", "xExpandableUpto". Then I just remove the chr strings and be left with numbers as characters. Then I will convert them to numbers and transform them all to GB units.

Here's where I'm at currently.

df$RAM : chr [1:28752] "4GBRAM", "..."

df$ROM : chr [1:28752] "64GBROM", "..."

df$ExpandableUpto" : chr [1:28752] "ExpandableUpto256GB", "..."

enter image description here

I can get the chr strings into new variables "RAM" "ROM" and "ExpandableUpto" but since not all of the vectors have 3 sets of strings (some have 1 and 2), the strings fill the variables 1 at a time starting with "RAM". That means that some of my rows have "4GBROM" in the "RAM" variable. Is there a way to get only "RAM" strings in the "RAM" variable. etc?

What I started with:

enter image description here

remove whitespace in "Specs"

Mobile_phones7 <- Mobile_phones6 %>% mutate(Specs = stringr::str_remove_all(Specs, "\\s "))

remove "|" from chr strings from "Specs"

Mobile_phones8 <- Mobile_phones7 %>% mutate(Specs = stringr::str_split(Specs, coll("|")))

split the character strings in "Specs" and place them in a list

of chr vectors of [1:3] strings.

Mobile_phones9 <- Mobile_phones8 %>% rowwise() %>% mutate(Specs = Reduce(paste, Specs))

separate Specs list vectors into 3 new variables separated by whitespace

Mobile_phones10 <- Mobile_phones9 %>% separate(Specs, c("RAM", "ROM", "ExpandableUpto"), sep = "\\s ")

This resulted in:

enter image description here

Thanks Ben

CodePudding user response:

Please check the below code

code

library(tidyverse)
library(stringr)

data.frame(spec="4 GB RAM | 64 GB ROM | ExpandableUpto256GB") %>% 
tidyr::extract(col=spec, into = c('RAM', 'ROM', 'ExpandableUpto'), regex = '(.*)\\|(.*)\\|(.*)', remove = T) %>% 
mutate(across(c(RAM,ROM,ExpandableUpto), ~ str_remove_all(.x,'\\s')))

Created on 2023-01-20 with reprex v2.0.2

output

     RAM     ROM      ExpandableUpto
1 4GBRAM 64GBROM ExpandableUpto256GB


CodePudding user response:

Please check updated code as per your comments

data & code

data.frame(spec=c("4 GB RAM | 64 GB ROM | ExpandableUpto256GB",
                  "6 GB RAM | 128 GB ROM", 
                  "128 GB ROM",
                  "0 MB ROM | Expandable Upto 32 GB")) %>% 
  transmute(RAM=as.numeric(str_extract(spec, '\\d (?=\\s\\w \\sRAM)')),
         ROM=as.numeric(str_extract(spec, '\\d (?=\\s\\w \\sROM)')),
         ExpandableUpto=as.numeric(str_extract(spec, '(?<=[Upto|\\s])\\d '))
         ) 

Created on 2023-01-22 with reprex v2.0.2

output

  RAM ROM ExpandableUpto
1   4  64             64
2   6 128            128
3  NA 128             NA
4  NA   0             32

  • Related