Is there a code to create a column with only the speed number? In the Cpu column, as included in the image, too much unnecessary information is included for me. I only want the ''Ghz''number (f.i. 2.3, 1.8 and 2.5).
CodePudding user response:
You can do something like this:
library(stringr)
data %>%
mutate(speed = as.numeric(str_extract(Cpu, "\\d*[.]?\\d (?=GHz$)")))
CodePudding user response:
A slightly easier regex is this:
library(dplyr)
library(stringr)
df %>%
mutate(CPU_new = str_extract(Cpu, "[0-9.] (?=GHz)"))
base R
:
df$CPU_new <- str_extract(df$Cpu, "[0-9.] (?=GHz)")
How this works:
[0-9.]
: character class allowing digits and the period occurring at least one or more times(?=GHz)
: positive lookahead asserting that the match to beextract
ed must be followed by the literal stringGHz
CodePudding user response:
I think the other answer is better, but an alternative approach to using complicated regex is to extract just the 3 positions right before "GHz" using the stringr
package:
Data:
df <- data.frame(ScreenResolution = paste("Test",LETTERS[1:3]),
Cpu = c("Intel Core i5 2.3GHz","Intel Core i5 1.8GHz",
"Intel Core i5 72000U 2.3GHz"),
Ram = "8GB")
Code:
library(stringr)
df$Cpu_new <- str_sub(df$Cpu, str_locate(df$Cpu, pattern = "GHz")[1]-4,
str_locate(df$Cpu, pattern = "GHz")[1]-1)
Output:
# ScreenResolution Cpu Ram Cpu_new
# 1 Test A Intel Core i5 2.3GHz 8GB 2.3
# 2 Test B Intel Core i5 1.8GHz 8GB 1.8
# 3 Test C Intel Core i5 72000U 2.3GHz 8GB 2.3
If you wanted it to be numeric, use as.numeric(str_sub(...))