Home > Enterprise >  Applying log to all numeric >0 (across() with two .cols conditions)
Applying log to all numeric >0 (across() with two .cols conditions)

Time:03-11

I am trying to create the log of multiple variables in a dataframe which includes also non numeric variables, and would like to apply the function only to those numeric variables which include no zeros or negative values.

This is where I am at:

# creating a df with numeric and factor variables
a <- c(3, -1, 0, 5, 2)
b <- c(1, 3, 2, 1, 4)
c <- c(9, -2, 3, -5, 1)
d <- c(3, 0, 6, 1, 5)
e <- c("red", "blu", "yellow", "green", "white")
f <- c(0, 1, 1, 0, 0)
g <- c(3, 1, 1, 4, 2)

df <- data.frame(a,b,c,d,e,f,g) %>% 
mutate_at("f",factor)

#applying the transformation to all numeric variables
df.log <- df %>% 
  as_tibble() %>% 
  mutate(across(
    .cols = is.numeric, #& all()>0,#ideally I shall add here the condition '& >0' but it doesn't work 
    .fns = list(log = log),
    .names = "{.col}_{.fn}"))

With the code above I have NaN for negative values and -inf for zeros. I could then drop columns with those values, but I'd like to find a clean way to do it all at once.

Another idea was to remove columns with values <=0 before as follows:

df.skim <- df %>% 
  select_if(is.numeric)

df.skim <- df.skim[,sapply(df.skim, min)>0]

and then apply the log to the columns left, but in this way I drop also the key column and I cannot easily merge back the data.

CodePudding user response:

You can create a little function that you then pass onto across inside where:

numeric_no_zero <- function(x) {
  if(!is.numeric(x)) return(FALSE)
  if(any(x <= 0)) return(FALSE)
  TRUE
}

Which you use like this:

df %>% 
  as_tibble() %>% 
  mutate(across(
    .cols = where(numeric_no_zero),
    .fns = list(log = log),
    .names = "{.col}_{.fn}"))
#> # A tibble: 5 x 9
#>       a     b     c     d e      f         g b_log g_log
#>   <dbl> <dbl> <dbl> <dbl> <chr>  <fct> <dbl> <dbl> <dbl>
#> 1     3     1     9     3 red    0         3 0     1.10 
#> 2    -1     3    -2     0 blu    1         1 1.10  0    
#> 3     0     2     3     6 yellow 1         1 0.693 0    
#> 4     5     1    -5     1 green  0         4 0     1.39 
#> 5     2     4     1     5 white  0         2 1.39  0.693

Created on 2022-03-10 by the reprex package (v2.0.1)

CodePudding user response:

You can use where() with an anonymous function to specify more complex conditions like yours:

library(tidyverse)

df.log <- df %>% 
  as_tibble() %>%
  mutate(across(
    .cols = where(~ is.numeric(.x) && all(.x > 0)),
    .fns = list(log = log),
    .names = "{.col}_{.fn}"))

Output:

# A tibble: 5 x 9
      a     b     c     d e      f         g b_log g_log
  <dbl> <dbl> <dbl> <dbl> <chr>  <fct> <dbl> <dbl> <dbl>
1     3     1     9     3 red    0         3 0     1.10 
2    -1     3    -2     0 blu    1         1 1.10  0    
3     0     2     3     6 yellow 1         1 0.693 0    
4     5     1    -5     1 green  0         4 0     1.39 
5     2     4     1     5 white  0         2 1.39  0.693
  • Related