Home > front end >  Splitting data in column based on a word
Splitting data in column based on a word

Time:05-18

Is there a code to create a column with only the speed number? In the Cpu column, as included in the image, too much unnecessary information is included for me. I only want the ''Ghz''number (f.i. 2.3, 1.8 and 2.5).

enter image description here

CodePudding user response:

You can do something like this:

library(stringr)

data %>%
  mutate(speed = as.numeric(str_extract(Cpu, "\\d*[.]?\\d (?=GHz$)")))

CodePudding user response:

A slightly easier regex is this:

library(dplyr)
library(stringr)
df %>%
  mutate(CPU_new = str_extract(Cpu, "[0-9.] (?=GHz)"))

base R:

df$CPU_new <- str_extract(df$Cpu, "[0-9.] (?=GHz)")

How this works:

  • [0-9.] : character class allowing digits and the period occurring at least one or more times
  • (?=GHz): positive lookahead asserting that the match to be extracted must be followed by the literal string GHz

CodePudding user response:

I think the other answer is better, but an alternative approach to using complicated regex is to extract just the 3 positions right before "GHz" using the stringr package:

Data:

df <- data.frame(ScreenResolution = paste("Test",LETTERS[1:3]),
                 Cpu = c("Intel Core i5 2.3GHz","Intel Core i5 1.8GHz",
                         "Intel Core i5 72000U 2.3GHz"),
                 Ram = "8GB")

Code:

library(stringr)
df$Cpu_new <- str_sub(df$Cpu, str_locate(df$Cpu, pattern = "GHz")[1]-4,
                              str_locate(df$Cpu, pattern = "GHz")[1]-1)

Output:

#   ScreenResolution                         Cpu Ram Cpu_new
# 1           Test A        Intel Core i5 2.3GHz 8GB     2.3
# 2           Test B        Intel Core i5 1.8GHz 8GB     1.8
# 3           Test C Intel Core i5 72000U 2.3GHz 8GB     2.3

If you wanted it to be numeric, use as.numeric(str_sub(...))

  • Related