Home > database >  How to truncate multiple columns in R
How to truncate multiple columns in R

Time:12-06

I need to truncate many columns to range from -3.0 to 3.0. This means: any values greater than 3.0, should be recoded as 3.0 into a new variable, and all values less than -3.0 should also be recoded into this new variable as -3.0.

Here is an example dataset

library(tidyverse)
MyData <- tibble( a = c(2.3, 3.0, -1.5, 3.7, -4.7, 5.2),
                  b = c(3.6, 1.52, -5.4, 4.6, 1.5, 2.2),
                  c = c(1.0, -2.6, -1.2, 2.5, -4.0, 3.0))

I found out how to do that creating a new variable for each old variable, using mutate() and case_when() however I have too many variables to do it manually, and I was wondering how I could do that in a shorter and more elegant way. I would like to see an output like the one originated from this manual code:

MyData %>% 
  mutate(Ta = case_when(a >= 3.0 ~ 3.0,
                        a <= -3.0 ~ -3.0,
                        T ~ a),
         Tb = case_when(b >= 3.0 ~ 3.0,
                        b <= -3.0 ~ -3.0,
                        T ~ b),
         Tc = case_when(c >= 3.0 ~ 3.0,
                        c <= -3.0 ~ -3.0,
                        T ~ c))

# A tibble: 6 x 6
      a     b     c    Ta    Tb    Tc
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   2.3  3.6    1     2.3  3      1  
2   3    1.52  -2.6   3    1.52  -2.6
3  -1.5 -5.4   -1.2  -1.5 -3     -1.2
4   3.7  4.6    2.5   3    3      2.5
5  -4.7  1.5   -4    -3    1.5   -3  
6   5.2  2.2    3     3    2.2    3  

CodePudding user response:

You might define a function and then apply it to many columns using across.

pmin(3, pmax(x, -3)) is one way to constrain a vector (ie a column of a data frame) to the range -3 to 3. It takes the max of x and -3, and then takes the min of the result and 3.

The .names parameter of across lets us specify that the result of these operations should be additional columns named T [orig column name].

cap3 <- function(x) { pmin(3, pmax(x, -3)) }

MyData %>%
  mutate(across(a:c, cap3, .names = "T{.col}"))

  # mutate(across(1:3, cap3, .names = "T{.col}"))            # Equiv. alternative
  # mutate(across(everything(), cap3, .names = "T{.col}"))   # Equiv. alternative

Result

# A tibble: 6 x 6
      a     b     c    Ta    Tb    Tc
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   2.3  3.6    1     2.3  3      1  
2   3    1.52  -2.6   3    1.52  -2.6
3  -1.5 -5.4   -1.2  -1.5 -3     -1.2
4   3.7  4.6    2.5   3    3      2.5
5  -4.7  1.5   -4    -3    1.5   -3  
6   5.2  2.2    3     3    2.2    3  

CodePudding user response:

Convert to matrix, take the pmin and pmax and append that to MyData:

MyData %>%
  as.matrix %>%
  pmin(3) %>%
  pmax(-3) %>%
  cbind(MyData, T = .)

giving:

     a     b    c  T.a   T.b  T.c
1  2.3  3.60  1.0  2.3  3.00  1.0
2  3.0  1.52 -2.6  3.0  1.52 -2.6
3 -1.5 -5.40 -1.2 -1.5 -3.00 -1.2
4  3.7  4.60  2.5  3.0  3.00  2.5
5 -4.7  1.50 -4.0 -3.0  1.50 -3.0
6  5.2  2.20  3.0  3.0  2.20  3.0

CodePudding user response:

Write the code that you want to apply to each column in a function and apply it with across.

library(dplyr)

func <- function(a) {
  case_when(a >= 3.0 ~ 3.0,
            a <= -3.0 ~ -3.0,
            T ~ a)  
}

MyData %>%
  mutate(across(.fns = func, .names = 'T{col}'))

#    a     b     c    Ta    Tb    Tc
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1   2.3  3.6    1     2.3  3      1  
#2   3    1.52  -2.6   3    1.52  -2.6
#3  -1.5 -5.4   -1.2  -1.5 -3     -1.2
#4   3.7  4.6    2.5   3    3      2.5
#5  -4.7  1.5   -4    -3    1.5   -3  
#6   5.2  2.2    3     3    2.2    3  
  • Related