I have a dataframe df containing multiple columns named x_1, x_2, x_3 and so on that are class character but I want to turn them numeric. The problem is, that some of the values are decimals and others are in percentages, containing a %-sign. See my input here:
df <- data.frame(Company = c("abc", "def", "ghi"),
x_1 = c(0.05, "0.6%", "11.25%"),
x_y_1 = c("val_1", "val_2", "val_3"),
x_2 = c("3.5%", 0.12, 0.7),
x_y_2 = c("val_1", "val_2", "val_3"),
x_3 = c(0.83, 0.4, "0.9%"),
x_y_3 = c("val_1", "val_2", "val_3")
)
> df
Company x_1 x_y_1 x_2 x_y_2 x_3 x_y_3
1 abc 0.05 val_1 3.5% val_1 0.83 val_1
2 def 0.6% val_2 0.12 val_2 0.4 val_2
3 ghi 11.25% val_3 0.7 val_3 0.9% val_3
I want to turn the columns x_1, x_2, x_3 into numeric, so the output should look something like this:
> df
Company x_1 x_y_1 x_2 x_y_2 x_3 x_y_3
1 abc 0.05 val_1 0.035 val_1 0.83 val_1
2 def 0.006 val_2 0.12 val_2 0.4 val_2
3 ghi 0.1125 val_3 0.7 val_3 0.009 val_3
How do I transform only the values containing % signs into the right numeric value? Thank you.
CodePudding user response:
The premise in these examples is a two-step process:
- Determine the denominator based on the presence of
%
; - Derive the numerator by removing the
%
(whether or not it exists) and converting tonumeric
.
base R
df[c(2,4,6)] <- lapply(df[c(2,4,6)],
function(z) as.numeric(sub("%", "", z)) / ifelse(grepl("%", z), 100, 1))
df
# Company x_1 x_y_1 x_2 x_y_2 x_3 x_y_3
# 1 abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2 def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3 ghi 0.1125 val_3 0.700 val_3 0.009 val_3
dplyr
library(dplyr)
df %>%
mutate(across(c(2, 4, 6),
~ as.numeric(sub("%", "", .)) / if_else(grepl("%", .), 100, 1)))
# Company x_1 x_y_1 x_2 x_y_2 x_3 x_y_3
# 1 abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2 def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3 ghi 0.1125 val_3 0.700 val_3 0.009 val_3