I am trying to create the log of multiple variables in a dataframe which includes also non numeric variables, and would like to apply the function only to those numeric variables which include no zeros or negative values.
This is where I am at:
# creating a df with numeric and factor variables
a <- c(3, -1, 0, 5, 2)
b <- c(1, 3, 2, 1, 4)
c <- c(9, -2, 3, -5, 1)
d <- c(3, 0, 6, 1, 5)
e <- c("red", "blu", "yellow", "green", "white")
f <- c(0, 1, 1, 0, 0)
g <- c(3, 1, 1, 4, 2)
df <- data.frame(a,b,c,d,e,f,g) %>%
mutate_at("f",factor)
#applying the transformation to all numeric variables
df.log <- df %>%
as_tibble() %>%
mutate(across(
.cols = is.numeric, #& all()>0,#ideally I shall add here the condition '& >0' but it doesn't work
.fns = list(log = log),
.names = "{.col}_{.fn}"))
With the code above I have NaN
for negative values and -inf
for zeros. I could then drop columns with those values, but I'd like to find a clean way to do it all at once.
Another idea was to remove columns with values <=0
before as follows:
df.skim <- df %>%
select_if(is.numeric)
df.skim <- df.skim[,sapply(df.skim, min)>0]
and then apply the log to the columns left, but in this way I drop also the key column and I cannot easily merge back the data.
CodePudding user response:
You can create a little function that you then pass onto across
inside where
:
numeric_no_zero <- function(x) {
if(!is.numeric(x)) return(FALSE)
if(any(x <= 0)) return(FALSE)
TRUE
}
Which you use like this:
df %>%
as_tibble() %>%
mutate(across(
.cols = where(numeric_no_zero),
.fns = list(log = log),
.names = "{.col}_{.fn}"))
#> # A tibble: 5 x 9
#> a b c d e f g b_log g_log
#> <dbl> <dbl> <dbl> <dbl> <chr> <fct> <dbl> <dbl> <dbl>
#> 1 3 1 9 3 red 0 3 0 1.10
#> 2 -1 3 -2 0 blu 1 1 1.10 0
#> 3 0 2 3 6 yellow 1 1 0.693 0
#> 4 5 1 -5 1 green 0 4 0 1.39
#> 5 2 4 1 5 white 0 2 1.39 0.693
Created on 2022-03-10 by the reprex package (v2.0.1)
CodePudding user response:
You can use where()
with an anonymous function to specify more complex conditions like yours:
library(tidyverse)
df.log <- df %>%
as_tibble() %>%
mutate(across(
.cols = where(~ is.numeric(.x) && all(.x > 0)),
.fns = list(log = log),
.names = "{.col}_{.fn}"))
Output:
# A tibble: 5 x 9
a b c d e f g b_log g_log
<dbl> <dbl> <dbl> <dbl> <chr> <fct> <dbl> <dbl> <dbl>
1 3 1 9 3 red 0 3 0 1.10
2 -1 3 -2 0 blu 1 1 1.10 0
3 0 2 3 6 yellow 1 1 0.693 0
4 5 1 -5 1 green 0 4 0 1.39
5 2 4 1 5 white 0 2 1.39 0.693