Home > Software engineering >  Transform column into numeric if some values have % sign and some don't
Transform column into numeric if some values have % sign and some don't

Time:01-10

I have a dataframe df containing multiple columns named x_1, x_2, x_3 and so on that are class character but I want to turn them numeric. The problem is, that some of the values are decimals and others are in percentages, containing a %-sign. See my input here:

df <- data.frame(Company  = c("abc", "def", "ghi"),
                  x_1 = c(0.05, "0.6%", "11.25%"),
                  x_y_1 = c("val_1", "val_2", "val_3"),
                  x_2 = c("3.5%", 0.12, 0.7),
                  x_y_2 = c("val_1", "val_2", "val_3"),
                  x_3 = c(0.83, 0.4, "0.9%"),
                  x_y_3 = c("val_1", "val_2", "val_3")
                  )
> df
  Company    x_1 x_y_1  x_2 x_y_2  x_3 x_y_3
1     abc   0.05 val_1 3.5% val_1 0.83 val_1
2     def   0.6% val_2 0.12 val_2  0.4 val_2
3     ghi 11.25% val_3  0.7 val_3 0.9% val_3

I want to turn the columns x_1, x_2, x_3 into numeric, so the output should look something like this:

> df
  Company    x_1   x_y_1  x_2   x_y_2  x_3   x_y_3
1     abc   0.05   val_1  0.035 val_1  0.83  val_1
2     def   0.006  val_2  0.12  val_2  0.4   val_2
3     ghi   0.1125 val_3  0.7   val_3  0.009 val_3

How do I transform only the values containing % signs into the right numeric value? Thank you.

CodePudding user response:

The premise in these examples is a two-step process:

  1. Determine the denominator based on the presence of %;
  2. Derive the numerator by removing the % (whether or not it exists) and converting to numeric.

base R

df[c(2,4,6)] <- lapply(df[c(2,4,6)],
                       function(z) as.numeric(sub("%", "", z)) / ifelse(grepl("%", z), 100, 1))
df
#   Company    x_1 x_y_1   x_2 x_y_2   x_3 x_y_3
# 1     abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2     def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3     ghi 0.1125 val_3 0.700 val_3 0.009 val_3

dplyr

library(dplyr)
df %>%
  mutate(across(c(2, 4, 6),
         ~ as.numeric(sub("%", "", .)) / if_else(grepl("%", .), 100, 1)))
#   Company    x_1 x_y_1   x_2 x_y_2   x_3 x_y_3
# 1     abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2     def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3     ghi 0.1125 val_3 0.700 val_3 0.009 val_3
  • Related