How to use case_when with entire dataframe?-CodePudding

I'd like to apply case_when to all columns in the data frame.

set.seed(1)
data <- tibble(x = runif(10), y = x * 2) 
data

For all columns above 0.5, I'd like to replace with a string ">0.5", for those above 1, I'd like to replace with ">1".

I've tried to case_when, but it appears that I have to specify the column like x and y. I'd like to use case_when without specifying columns and use it on the entire data frame instead.

CodePudding user response：

a purrr solution;

library(purrr)

data %>%
map_df(~case_when(.x > 0.5 & .x < 1 ~ ">0.5",
                  .x >= 1 ~ ">1"))

output;

   x     y    
   <chr> <chr>
 1 NA    >0.5 
 2 NA    >0.5 
 3 >0.5  >1   
 4 >0.5  >1   
 5 NA    NA   
 6 >0.5  >1   
 7 >0.5  >1   
 8 >0.5  >1   
 9 >0.5  >1   
10 NA    NA

CodePudding user response：

Here is a potential solution:

library(tidyverse)

set.seed(1)
data <- tibble(x = runif(10), y = x * 2) 
data
#> # A tibble: 10 × 2
#>         x     y
#>     <dbl> <dbl>
#>  1 0.266  0.531
#>  2 0.372  0.744
#>  3 0.573  1.15 
#>  4 0.908  1.82 
#>  5 0.202  0.403
#>  6 0.898  1.80 
#>  7 0.945  1.89 
#>  8 0.661  1.32 
#>  9 0.629  1.26 
#> 10 0.0618 0.124

data %>%
  mutate(across(everything(),
                ~case_when(.x > 0.5 & .x < 1.0 ~ ">0.5",
                           .x >= 1.0 ~ ">1")))
#> # A tibble: 10 × 2
#>    x     y    
#>    <chr> <chr>
#>  1 <NA>  >0.5 
#>  2 <NA>  >0.5 
#>  3 >0.5  >1   
#>  4 >0.5  >1   
#>  5 <NA>  <NA> 
#>  6 >0.5  >1   
#>  7 >0.5  >1   
#>  8 >0.5  >1   
#>  9 >0.5  >1   
#> 10 <NA>  <NA>

^{Created on 2021-10-24 by the reprex package (v2.0.1)}

CodePudding user response：

We can use if_all with everything() (to select all the columns) to create the logical vector

library(dplyr)
data %>%
     mutate(new = case_when(if_all(everything(),  `>`, 1) ~ ">1", if_all(everything(),  `>`, 0.5) ~ ">0.5")
                  )

-output

# A tibble: 10 × 3
        x     y new  
    <dbl> <dbl> <chr>
 1 0.266  0.531 <NA> 
 2 0.372  0.744 <NA> 
 3 0.573  1.15  >0.5 
 4 0.908  1.82  >0.5 
 5 0.202  0.403 <NA> 
 6 0.898  1.80  >0.5 
 7 0.945  1.89  >0.5 
 8 0.661  1.32  >0.5 
 9 0.629  1.26  >0.5 
10 0.0618 0.124 <NA>

NOTE: As the OP specified on the entire dataset, this does create the column based on evaluation on the entire dataset

If the OP meant separate columns, use between

data %>% 
   mutate(across(everything(), 
   ~ case_when(between(.x, 0.5, 1) ~"> 0.5", TRUE ~ "> 1")))
# A tibble: 10 × 2
   x     y    
   <chr> <chr>
 1 > 1   > 0.5
 2 > 1   > 0.5
 3 > 0.5 > 1  
 4 > 0.5 > 1  
 5 > 1   > 1  
 6 > 0.5 > 1  
 7 > 0.5 > 1  
 8 > 0.5 > 1  
 9 > 0.5 > 1  
10 > 1   > 1

If we want to do this separately

out <- as.data.frame(data)
out[] <- case_when(data > 0.5 ~ "> 0.5", data > 1 ~ "> 1")

CodePudding user response：

You can use cut -

library(dplyr)

data %>%
  mutate(across(.fns = ~cut(., c(0.5, 1, Inf), c(">0.5", ">1"))))

#    x     y    
#   <fct> <fct>
# 1 NA    >0.5 
# 2 NA    >0.5 
# 3 >0.5  >1   
# 4 >0.5  >1   
# 5 NA    NA   
# 6 >0.5  >1   
# 7 >0.5  >1   
# 8 >0.5  >1   
# 9 >0.5  >1   
#10 NA    NA

In base R, with lapply -

data[] <- lapply(data, function(x) cut(x, c(0.5, 1, Inf), c(">0.5", ">1")))

CodePudding user response：

Another base R solution:

ff = function(z){x = rep(NA, length(z)); x[z > .5] = ">.5"; x[z > 1] = ">1";z = x }
sapply(data, ff)
#      x     y    
# [1,] NA    ">.5"
# [2,] NA    ">.5"
# [3,] ">.5" ">1" 
# [4,] ">.5" ">1" 
# [5,] NA    NA   
# [6,] ">.5" ">1" 
# [7,] ">.5" ">1" 
# [8,] ">.5" ">1" 
# [9,] ">.5" ">1" 
#[10,] NA    NA