Home > Net >  How to remove % sign from end of number in dataframe in R
How to remove % sign from end of number in dataframe in R

Time:11-30

I was wondering how to find elements that end in % and remove the % sign from those elements?

data <- read.table(text="
COURSE          CLASE  GROUP_A   GROUP_B
algebra         1         25%        8%
algebra         2         35%        9%
number_theory   3         18%        7%
number_theory   4         14%        11%
math_games      5         12%        5%
math_games      6         19%        4%
",h=TRUE)

CodePudding user response:

lapply over the columns removing any % at the end and then convert the types in the data frame to numeric if they should be numeric. No packaes are used.

data |>
  replace(TRUE, lapply(data, sub, pattern = "%$", replacement = "")) |>
  type.convert(as.is = TRUE)

giving:

         COURSE CLASE GROUP_A GROUP_B
1       algebra     1      25       8
2       algebra     2      35       9
3 number_theory     3      18       7
4 number_theory     4      14      11
5    math_games     5      12       5
6    math_games     6      19       4

With dplyr it is similar but we use across:

library(dplyr)

data %>%
  mutate(across(everything(), ~ sub("%$", "", .x))) %>%
  type.convert(as.is = TRUE)

CodePudding user response:

This code replaces all '%' characters in every column, you can specify the columns. edited to convert numeric:

mydata <- read.table(text="
        COURSE          CLASE  GROUP_A   GROUP_B
        algebra         1         25%        8%
        algebra         2         35%        9%
        number_theory   3         18%        7%
        number_theory   4         14%        11%
        math_games      5         12%        5%
        math_games      6         19%        4%
        ",h=TRUE)
mydata[,c("GROUP_A","GROUP_B")] <- lapply(mydata[,c("GROUP_A","GROUP_B")],
                                          function(x) as.numeric(gsub("%$","",x)))

just a recommendation, I would not use data as a table name, because it's a defined function in R.

CodePudding user response:

Libraries

library(dplyr)
library(stringr)

data

data <- read.table(text="
COURSE          CLASE  GROUP_A   GROUP_B
algebra         1         25%        8%
algebra         2         35%        9%
number_theory   3         18%        7%
number_theory   4         14%        11%
math_games      5         12%        5%
math_games      6         19%        4%
",h=TRUE) %>% as_tibble()

Solution

Using across() from dplyr and str_remove() from stringr, as a bonus, I'm also transforming those variables to be numeric rather than strings. But if you don't need it, you can just delete %>% as.numeric() that will work the same.

Inside across(), you can specify all variables you want to apply this transformation; where it says c(GROUP_A, GROUP_B), you can add as many column names as you want.

data %>% 
  mutate(across(c(GROUP_A, GROUP_B), 
                ~str_remove(.x, "%") %>% as.numeric())) 

Output


#> # A tibble: 6 × 4
#>   COURSE        CLASE GROUP_A GROUP_B
#>   <chr>         <int>   <dbl>   <dbl>
#> 1 algebra           1      25       8
#> 2 algebra           2      35       9
#> 3 number_theory     3      18       7
#> 4 number_theory     4      14      11
#> 5 math_games        5      12       5
#> 6 math_games        6      19       4

Created on 2022-11-29 with reprex v2.0.2

CodePudding user response:

Could also do:

idx <- apply(data, 2, function(x) any(grepl('%$', x)))
data[idx] <- lapply(data[idx], function(x) as.numeric(sub('%$', '', x)))

Output:

         COURSE CLASE GROUP_A GROUP_B
1       algebra     1      25       8
2       algebra     2      35       9
3 number_theory     3      18       7
4 number_theory     4      14      11
5    math_games     5      12       5
6    math_games     6      19       4

Column types:

str(data)

'data.frame':   6 obs. of  4 variables:
 $ COURSE : Factor w/ 3 levels "algebra","math_games",..: 1 1 3 3 2 2
 $ CLASE  : int  1 2 3 4 5 6
 $ GROUP_A: num  25 35 18 14 12 19
 $ GROUP_B: num  8 9 7 11 5 4
  • Related