I'd like to apply case_when to all columns in the data frame.
set.seed(1)
data <- tibble(x = runif(10), y = x * 2)
data
For all columns above 0.5, I'd like to replace with a string ">0.5", for those above 1, I'd like to replace with ">1".
I've tried to case_when, but it appears that I have to specify the column like x and y. I'd like to use case_when without specifying columns and use it on the entire data frame instead.
CodePudding user response:
a purrr
solution;
library(purrr)
data %>%
map_df(~case_when(.x > 0.5 & .x < 1 ~ ">0.5",
.x >= 1 ~ ">1"))
output;
x y
<chr> <chr>
1 NA >0.5
2 NA >0.5
3 >0.5 >1
4 >0.5 >1
5 NA NA
6 >0.5 >1
7 >0.5 >1
8 >0.5 >1
9 >0.5 >1
10 NA NA
CodePudding user response:
Here is a potential solution:
library(tidyverse)
set.seed(1)
data <- tibble(x = runif(10), y = x * 2)
data
#> # A tibble: 10 × 2
#> x y
#> <dbl> <dbl>
#> 1 0.266 0.531
#> 2 0.372 0.744
#> 3 0.573 1.15
#> 4 0.908 1.82
#> 5 0.202 0.403
#> 6 0.898 1.80
#> 7 0.945 1.89
#> 8 0.661 1.32
#> 9 0.629 1.26
#> 10 0.0618 0.124
data %>%
mutate(across(everything(),
~case_when(.x > 0.5 & .x < 1.0 ~ ">0.5",
.x >= 1.0 ~ ">1")))
#> # A tibble: 10 × 2
#> x y
#> <chr> <chr>
#> 1 <NA> >0.5
#> 2 <NA> >0.5
#> 3 >0.5 >1
#> 4 >0.5 >1
#> 5 <NA> <NA>
#> 6 >0.5 >1
#> 7 >0.5 >1
#> 8 >0.5 >1
#> 9 >0.5 >1
#> 10 <NA> <NA>
Created on 2021-10-24 by the reprex package (v2.0.1)
CodePudding user response:
We can use if_all
with everything()
(to select all the columns) to create the logical vector
library(dplyr)
data %>%
mutate(new = case_when(if_all(everything(), `>`, 1) ~ ">1", if_all(everything(), `>`, 0.5) ~ ">0.5")
)
-output
# A tibble: 10 × 3
x y new
<dbl> <dbl> <chr>
1 0.266 0.531 <NA>
2 0.372 0.744 <NA>
3 0.573 1.15 >0.5
4 0.908 1.82 >0.5
5 0.202 0.403 <NA>
6 0.898 1.80 >0.5
7 0.945 1.89 >0.5
8 0.661 1.32 >0.5
9 0.629 1.26 >0.5
10 0.0618 0.124 <NA>
NOTE: As the OP specified on the entire dataset, this does create the column based on evaluation on the entire dataset
If the OP meant separate columns, use between
data %>%
mutate(across(everything(),
~ case_when(between(.x, 0.5, 1) ~"> 0.5", TRUE ~ "> 1")))
# A tibble: 10 × 2
x y
<chr> <chr>
1 > 1 > 0.5
2 > 1 > 0.5
3 > 0.5 > 1
4 > 0.5 > 1
5 > 1 > 1
6 > 0.5 > 1
7 > 0.5 > 1
8 > 0.5 > 1
9 > 0.5 > 1
10 > 1 > 1
If we want to do this separately
out <- as.data.frame(data)
out[] <- case_when(data > 0.5 ~ "> 0.5", data > 1 ~ "> 1")
CodePudding user response:
You can use cut
-
library(dplyr)
data %>%
mutate(across(.fns = ~cut(., c(0.5, 1, Inf), c(">0.5", ">1"))))
# x y
# <fct> <fct>
# 1 NA >0.5
# 2 NA >0.5
# 3 >0.5 >1
# 4 >0.5 >1
# 5 NA NA
# 6 >0.5 >1
# 7 >0.5 >1
# 8 >0.5 >1
# 9 >0.5 >1
#10 NA NA
In base R, with lapply
-
data[] <- lapply(data, function(x) cut(x, c(0.5, 1, Inf), c(">0.5", ">1")))
CodePudding user response:
Another base R solution:
ff = function(z){x = rep(NA, length(z)); x[z > .5] = ">.5"; x[z > 1] = ">1";z = x }
sapply(data, ff)
# x y
# [1,] NA ">.5"
# [2,] NA ">.5"
# [3,] ">.5" ">1"
# [4,] ">.5" ">1"
# [5,] NA NA
# [6,] ">.5" ">1"
# [7,] ">.5" ">1"
# [8,] ">.5" ">1"
# [9,] ">.5" ">1"
#[10,] NA NA