Home > Mobile >  Replace values multiple columns
Replace values multiple columns

Time:10-07

I have a data table "data" with 25 columns. In some of the columns (about 15), which contain numeric values (but defined as characters after import), I want to replace certain characters, f.e. "," by ".", "<" by "", ">" by "" etc. (could be 10 or more combinations), because some values are like this "<0,17" or "> 1,5".

As the column names change (because it affects different data tables) I would like to solve it this way (it's not correct what I code, it's just to show, what I want to do).

replace <- list ("," = ".", "<" = "", ">" = "")
affectedColumns = c("name1", "name2", "name3" ... "name 14", "name 15").

mydata %>%
  mutate(affectedColumns, replace)

Another problem is, that some of the columns are numeric and some of them are characters. Does it make sense to first convert all values in "affectedColumns" to to characters (as.character) > then do the replacement process and then convert all of them back to numeric (as.numeric)?

In the end I would like to have values with "." as comma and without any "<" or ">" or blank spaces.

Is there a way to do that? Thank you!

CodePudding user response:

Here is a base R way.

mydata[affectedColumns] <- lapply(mydata[affectedColumns], \(x){
  for(nm in names(replace)) x <- sub(nm, replace[nm], x)
  as.numeric(x)
})

CodePudding user response:

You can use the parse_number from the readr package to convert to numeric while removing larger/less than signs.

library(readr)

df <- data.frame("name1" = c("1,5", "> 1,5", "<1,6"), 
                 "name2" = c("1,5", "1,5", "1,5"), 
                 "name3" = c("1,0", "1", "1"),
                 "name4" = c(1.5, 1, 0.5)
                 )

affectedColumns <- c("name1", "name2", "name3")

new_df <- mutate(df, across(affectedColumns, .fns = ~parse_number(.x, locale = locale(decimal_mark = ","))))

CodePudding user response:

Here's a dplyr solution:

library(dplyr)
mydata %>%
  # Step 1: remove < and >:
  mutate(across(c(everything()), 
                ~ sub("\\s?(>|<)", "", .))) %>%
  # Step 2: replace dot by comma:
  mutate(across(c(everything()), 
                ~ sub("\\.", ",", .))) 
  col1   col2
1  1,2 12,701
2    3  55,77
3    5   5000

EDIT:

Here's a solution with setNames and stringr:

First define your sets of new and old values (make sure to escape regex metacharacters such as the .):

replacements <- setNames(c("", "", ","),     # new values
                         c("<", ">", "\\.")) # old values

or, more economically:

replacements <- setNames(c("", ","),      # new values
                         c("<|>", "\\.")) # old values

Now use str_replace_allto implement the changes in one go:

library(stringr)
mydata %>%
  mutate(across(c(col1:col2), 
                ~ str_replace_all(., replacements)))

Toy data:

mydata <- data.frame(
  col1 = c("1.2", "3", "<5"),
  col2 = c(">12.701", "55,77", "< 5000")
)

CodePudding user response:

Consider a combination of the mutate, across and the case_when functions form the dplyr package. You can find them here: https://dplyr.tidyverse.org/reference/across.html and here: https://dplyr.tidyverse.org/reference/case_when.html or give a minimal reproducible example.

Best, M.

  • Related