I have a data frame with following contents:
df$old_price <- c('SR 2356' , 'SR 785' , 'SR 50/4 pack', 'SR 10/4 pack,'SR 490')
How do I replace values in old_price
column where values such 'SR 50/4 pack' or 'SR 10/4 pack to give out 12.5 and 2.5 respectively without corrupting the data?
I tried df$old_price <- as.integer(gsub('[a-zA-Z]', '', df$old_price))
. However, it seems like it creates strange column values.
CodePudding user response:
This could be another solution:
library(stringr)
unlist(lapply(str_extract(vec, "\\d.*\\d"), \(x) eval(parse(text = x))))
[1] 2356.0 785.0 12.5 2.5 490.0
An alternative regex solution to suggested by dear Ian Campbell:
unlist(lapply(str_extract(vec, "[\\d,./] "), \(x) eval(parse(text = x))))
CodePudding user response:
Not sure how stable this is, but you could try
library(stringr)
library(dplyr)
df %>%
mutate(new.price = as.integer(str_extract(old.price, "\\d (?=/|$)")) / coalesce(as.integer(str_extract(old.price, "(?<=/)\\d ")), 1))
This returns
old.price new.price
1 SR 2356 2356.0
2 SR 785 785.0
3 SR 50/4 pack 12.5
4 SR 10/4 pack 2.5
5 SR 490 490.0
CodePudding user response:
To get numeric values of character with division symbol /
, an alternative way is to split the character so that the numbers without /
are extracted. After that, the extracted numbers are converted to numeric, and then division of the numbers is conducted by using /
.
# Define the function
getNumber <- function(string_vect){
extracted_number <- gsub(".*?([0-9/0-9] ).*","\\1", string_vect)
split_number <- strsplit(extracted_number, "/") |> unlist() |> as.numeric()
divided_number <- split_number[1]/split_number[2]
return(divided_number)
}
#Apply the function to the column
mydf <- data.frame(price = c("SR 50/4 pack", "SR 10/4"))
lapply(mydf$price, getNumber) |> unlist()
#[1] 12.5 2.5
If the column contains mixed characters, some of them with /
, and others without it, the function can be modified with conditionals if
and else
as follows:
getAllnumber <- function(string_vect){
extracted_number <- gsub(".*?([0-9/0-9] ).*","\\1", string_vect)
if(grepl("/", string_vect)){
split_number <- strsplit(extracted_number, "/") |> unlist() |> as.numeric()
resulted_number <- split_number[1]/split_number[2]
}
else{
resulted_number <- extracted_number |> as.numeric()
}
return(resulted_number)
}
#apply the function to the column
mydf <- data.frame(price = c("SR 2356","SR 785","SR 50/4 pack",
"SR 10/4 pack","SR 490"))
lapply(mydf$price, getAllnumber) |> unlist()
#[1] 2356.0 785.0 12.5 2.5 490.0
# or
vapply(dat, getAllnumber, numeric(1))
# SR 2356 SR 785 SR 50/4 pack SR 10/4 pack SR 490
# 2356.0 785.0 12.5 2.5 490.0
CodePudding user response:
Here's a solution using regex matching with str_match
from stringr
.
#sample data
input <- structure(list(item_id = 1:5,
price = c(265L, 995L, 20L, 7L, 421L),
old_price = c("105", "No old price", "SR 50/4 pack", "SR 10/4 pack", "520")),
class = "data.frame", row.names = c(NA, -5L))
# item_id price old_price
# 1 265 105
# 2 995 No old price
# 3 20 SR 50/4 pack
# 4 7 SR 10/4 pack
# 5 421 520
The regex expression groups each value when a match is made (e.g. 50
/4
). I then use a loop to identify records that matched and update the record with the new value.
for(i in 1:nrow(input)) {
x <- as.numeric(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,2])
y <- as.numeric(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,3])
if(!is.na(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,1])) {
input[i,]$old_price <- x / y
}
}
# item_id price old_price
# 1 265 105
# 2 995 No old price
# 3 20 12.5
# 4 7 2.5
# 5 421 520