I was wondering how to find elements that end in %
and remove the %
sign from those elements?
data <- read.table(text="
COURSE CLASE GROUP_A GROUP_B
algebra 1 25% 8%
algebra 2 35% 9%
number_theory 3 18% 7%
number_theory 4 14% 11%
math_games 5 12% 5%
math_games 6 19% 4%
",h=TRUE)
CodePudding user response:
lapply over the columns removing any % at the end and then convert the types in the data frame to numeric if they should be numeric. No packaes are used.
data |>
replace(TRUE, lapply(data, sub, pattern = "%$", replacement = "")) |>
type.convert(as.is = TRUE)
giving:
COURSE CLASE GROUP_A GROUP_B
1 algebra 1 25 8
2 algebra 2 35 9
3 number_theory 3 18 7
4 number_theory 4 14 11
5 math_games 5 12 5
6 math_games 6 19 4
With dplyr it is similar but we use across:
library(dplyr)
data %>%
mutate(across(everything(), ~ sub("%$", "", .x))) %>%
type.convert(as.is = TRUE)
CodePudding user response:
This code replaces all '%' characters in every column, you can specify the columns. edited to convert numeric:
mydata <- read.table(text="
COURSE CLASE GROUP_A GROUP_B
algebra 1 25% 8%
algebra 2 35% 9%
number_theory 3 18% 7%
number_theory 4 14% 11%
math_games 5 12% 5%
math_games 6 19% 4%
",h=TRUE)
mydata[,c("GROUP_A","GROUP_B")] <- lapply(mydata[,c("GROUP_A","GROUP_B")],
function(x) as.numeric(gsub("%$","",x)))
just a recommendation, I would not use data as a table name, because it's a defined function in R.
CodePudding user response:
Libraries
library(dplyr)
library(stringr)
data
data <- read.table(text="
COURSE CLASE GROUP_A GROUP_B
algebra 1 25% 8%
algebra 2 35% 9%
number_theory 3 18% 7%
number_theory 4 14% 11%
math_games 5 12% 5%
math_games 6 19% 4%
",h=TRUE) %>% as_tibble()
Solution
Using across()
from dplyr
and str_remove()
from stringr
, as a bonus, I'm also transforming those variables to be numeric rather than strings. But if you don't need it, you can just delete %>% as.numeric()
that will work the same.
Inside across()
, you can specify all variables you want to apply this transformation; where it says c(GROUP_A, GROUP_B)
, you can add as many column names as you want.
data %>%
mutate(across(c(GROUP_A, GROUP_B),
~str_remove(.x, "%") %>% as.numeric()))
Output
#> # A tibble: 6 × 4
#> COURSE CLASE GROUP_A GROUP_B
#> <chr> <int> <dbl> <dbl>
#> 1 algebra 1 25 8
#> 2 algebra 2 35 9
#> 3 number_theory 3 18 7
#> 4 number_theory 4 14 11
#> 5 math_games 5 12 5
#> 6 math_games 6 19 4
Created on 2022-11-29 with reprex v2.0.2
CodePudding user response:
Could also do:
idx <- apply(data, 2, function(x) any(grepl('%$', x)))
data[idx] <- lapply(data[idx], function(x) as.numeric(sub('%$', '', x)))
Output:
COURSE CLASE GROUP_A GROUP_B
1 algebra 1 25 8
2 algebra 2 35 9
3 number_theory 3 18 7
4 number_theory 4 14 11
5 math_games 5 12 5
6 math_games 6 19 4
Column types:
str(data)
'data.frame': 6 obs. of 4 variables:
$ COURSE : Factor w/ 3 levels "algebra","math_games",..: 1 1 3 3 2 2
$ CLASE : int 1 2 3 4 5 6
$ GROUP_A: num 25 35 18 14 12 19
$ GROUP_B: num 8 9 7 11 5 4