Home > Back-end >  Replace divided by symbol in numeric column
Replace divided by symbol in numeric column

Time:03-26

I have a data frame with following contents:

df$old_price <- c('SR 2356' , 'SR 785' , 'SR 50/4 pack', 'SR 10/4 pack,'SR 490')

How do I replace values in old_price column where values such 'SR 50/4 pack' or 'SR 10/4 pack to give out 12.5 and 2.5 respectively without corrupting the data?

I tried df$old_price <- as.integer(gsub('[a-zA-Z]', '', df$old_price)). However, it seems like it creates strange column values.

enter image description here

CodePudding user response:

This could be another solution:

library(stringr)

unlist(lapply(str_extract(vec, "\\d.*\\d"), \(x) eval(parse(text = x))))

[1] 2356.0  785.0   12.5    2.5  490.0

An alternative regex solution to suggested by dear Ian Campbell:

unlist(lapply(str_extract(vec, "[\\d,./] "), \(x) eval(parse(text = x))))

CodePudding user response:

Not sure how stable this is, but you could try

library(stringr)
library(dplyr)

df %>% 
  mutate(new.price = as.integer(str_extract(old.price, "\\d (?=/|$)")) / coalesce(as.integer(str_extract(old.price, "(?<=/)\\d ")), 1))

This returns

     old.price new.price
1      SR 2356    2356.0
2       SR 785     785.0
3 SR 50/4 pack      12.5
4 SR 10/4 pack       2.5
5       SR 490     490.0

CodePudding user response:

To get numeric values of character with division symbol /, an alternative way is to split the character so that the numbers without / are extracted. After that, the extracted numbers are converted to numeric, and then division of the numbers is conducted by using /.

# Define the function
getNumber <- function(string_vect){
   extracted_number <- gsub(".*?([0-9/0-9] ).*","\\1", string_vect)
   split_number <- strsplit(extracted_number, "/") |> unlist() |> as.numeric()
   divided_number <- split_number[1]/split_number[2]
   return(divided_number)
}
#Apply the function to the column
mydf <- data.frame(price = c("SR 50/4 pack", "SR 10/4"))

lapply(mydf$price, getNumber) |> unlist()
#[1] 12.5  2.5

If the column contains mixed characters, some of them with /, and others without it, the function can be modified with conditionals if and else as follows:

 getAllnumber <- function(string_vect){
     extracted_number <- gsub(".*?([0-9/0-9] ).*","\\1", string_vect)
     if(grepl("/", string_vect)){
     split_number <- strsplit(extracted_number, "/") |> unlist() |> as.numeric()
     resulted_number <- split_number[1]/split_number[2]
     }
     else{
         resulted_number <- extracted_number |> as.numeric()
     }
     return(resulted_number)
 }

#apply the function to the column

mydf <- data.frame(price = c("SR 2356","SR 785","SR 50/4 pack",
                            "SR 10/4 pack","SR 490"))

lapply(mydf$price, getAllnumber) |> unlist()
#[1] 2356.0  785.0   12.5    2.5  490.0

# or 
vapply(dat, getAllnumber, numeric(1))
#     SR 2356      SR 785 SR 50/4 pack   SR 10/4 pack    SR 490 
#     2356.0        785.0         12.5            2.5     490.0 

CodePudding user response:

Here's a solution using regex matching with str_match from stringr.

#sample data
input <- structure(list(item_id = 1:5, 
                    price = c(265L, 995L, 20L, 7L, 421L), 
                    old_price = c("105", "No old price", "SR 50/4 pack", "SR 10/4 pack", "520")), 
                    class = "data.frame", row.names = c(NA, -5L))

# item_id price    old_price
# 1       265      105
# 2       995      No old price
# 3       20       SR 50/4 pack
# 4       7        SR 10/4 pack
# 5       421      520

The regex expression groups each value when a match is made (e.g. 50/4). I then use a loop to identify records that matched and update the record with the new value.

for(i in 1:nrow(input)) {
  x <- as.numeric(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,2]) 
  y <- as.numeric(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,3])
  
    if(!is.na(str_match(input[i,]$old_price, "(\\d )\\/(\\d )")[,1])) {
      input[i,]$old_price <- x / y
    }
  
}

# item_id price    old_price
# 1       265      105
# 2       995      No old price
# 3       20       12.5
# 4       7        2.5
# 5       421      520
  • Related