I have a data table "data" with 25 columns. In some of the columns (about 15), which contain numeric values (but defined as characters after import), I want to replace certain characters, f.e. "," by ".", "<" by "", ">" by "" etc. (could be 10 or more combinations), because some values are like this "<0,17" or "> 1,5".
As the column names change (because it affects different data tables) I would like to solve it this way (it's not correct what I code, it's just to show, what I want to do).
replace <- list ("," = ".", "<" = "", ">" = "")
affectedColumns = c("name1", "name2", "name3" ... "name 14", "name 15").
mydata %>%
mutate(affectedColumns, replace)
Another problem is, that some of the columns are numeric and some of them are characters. Does it make sense to first convert all values in "affectedColumns" to to characters (as.character) > then do the replacement process and then convert all of them back to numeric (as.numeric)?
In the end I would like to have values with "." as comma and without any "<" or ">" or blank spaces.
Is there a way to do that? Thank you!
CodePudding user response:
Here is a base R way.
mydata[affectedColumns] <- lapply(mydata[affectedColumns], \(x){
for(nm in names(replace)) x <- sub(nm, replace[nm], x)
as.numeric(x)
})
CodePudding user response:
You can use the parse_number
from the readr
package to convert to numeric while removing larger/less than signs.
library(readr)
df <- data.frame("name1" = c("1,5", "> 1,5", "<1,6"),
"name2" = c("1,5", "1,5", "1,5"),
"name3" = c("1,0", "1", "1"),
"name4" = c(1.5, 1, 0.5)
)
affectedColumns <- c("name1", "name2", "name3")
new_df <- mutate(df, across(affectedColumns, .fns = ~parse_number(.x, locale = locale(decimal_mark = ","))))
CodePudding user response:
Here's a dplyr
solution:
library(dplyr)
mydata %>%
# Step 1: remove < and >:
mutate(across(c(everything()),
~ sub("\\s?(>|<)", "", .))) %>%
# Step 2: replace dot by comma:
mutate(across(c(everything()),
~ sub("\\.", ",", .)))
col1 col2
1 1,2 12,701
2 3 55,77
3 5 5000
EDIT:
Here's a solution with setNames
and stringr
:
First define your sets of new and old values (make sure to escape regex metacharacters such as the .
):
replacements <- setNames(c("", "", ","), # new values
c("<", ">", "\\.")) # old values
or, more economically:
replacements <- setNames(c("", ","), # new values
c("<|>", "\\.")) # old values
Now use str_replace_all
to implement the changes in one go:
library(stringr)
mydata %>%
mutate(across(c(col1:col2),
~ str_replace_all(., replacements)))
Toy data:
mydata <- data.frame(
col1 = c("1.2", "3", "<5"),
col2 = c(">12.701", "55,77", "< 5000")
)
CodePudding user response:
Consider a combination of the mutate
, across
and the case_when
functions form the dplyr
package. You can find them here: https://dplyr.tidyverse.org/reference/across.html and here: https://dplyr.tidyverse.org/reference/case_when.html or give a minimal reproducible example.
Best, M.