Home > Mobile >  dplyr::mutate_if() with multiple conditions including column class not working
dplyr::mutate_if() with multiple conditions including column class not working

Time:04-27

really confused why this is not working:

df <- data.frame(a = c("1", "2", "3"),
                   b = c(2, 3, 4),
                   c = c(4, 3, 2),
                   d = c("1", "5", "9"))
  
varnames = c("a", "c")
  
df %>% 
  mutate_if((is.character(.) & names(.) %in% varnames),
            funs(mean(as.numeric(.)))) 
  a b c d
1 1 2 4 1
2 2 3 3 5
3 3 4 2 9

Expected output would be

  a b c d
1 2 2 4 1
2 2 3 3 5
3 2 4 2 9

It works with a single condition, but the class condition I've actually only gotten to work using this formulation (which I don't know how to combine with the column name condition):

df %>% 
  mutate_if(function(col) is.character(col),
              funs(mean(as.numeric(.)))) 
  a b c d
1 2 2 4 5
2 2 3 3 5
3 2 4 2 5

However is.factor seems to work fine with the column names?

df %>% 
    mutate_if(!is.factor(.)  & (names(.) %in% varnames),
              funs(mean(as.numeric(.)))) 
  a b c d
1 2 2 3 1
2 2 3 3 5
3 2 4 3 9

CodePudding user response:

Note that mutate_if is being phased out in favour of across, so the following is perhaps what you want...

df %>% 
    mutate(across(where(is.character) & matches(varnames), ~mean(as.numeric(.))))

  a b c d
1 2 2 4 1
2 2 3 3 5
3 2 4 2 9

CodePudding user response:

mutate_if() doesn't work like you do. In its help page, it says that the second argument to set the conditions need to be one of the following two cases:

  1. A predicate function to be applied to the columns. (In this case, it can be a normal function or a lambda function, i.e. the form of ~ fun(.))
  2. A logical vector.

If you want to calculate means for character columns, the correct syntax is

Code 1:
df %>% mutate_if(~ is.character(.), funs(mean(as.numeric(.))))

instead of

df %>% mutate_if(is.character(.), funs(mean(as.numeric(.))))

which results in an significant error. Then, let's talk about the following code:

Code 2:
df %>% mutate_if(names(.) %in% varnames, funs(mean(as.numeric(.))))

Theoretically, mutate_if only extract column values, not column names, so ~ names(.) should make no sense in it. But why does Code 2 work fine without the ~ symbol in front of names(.)? The reason is that the "." in names actually represents df per se instead of each column from df because of the feature of the pipe operator (%>%). Therefore, Code 2 is actually executed equivalently as

df %>% mutate_if(names(df) %in% varnames,
                 funs(mean(as.numeric(.))))

where a logical vector is passed to it rather than a predicate function. names(df) %in% varnames returns TRUE FALSE TRUE FALSE and hence a and c are selected. This can explain why your first block fails but the last one works.


The first block
df %>% mutate_if(is.character(.) & names(.) %in% varnames,
                 funs(mean(as.numeric(.)))) 

Replace all "." with df, you can find

  • is.character(df) returns FALSE
  • names(df) %in% varnames returns TRUE FALSE TRUE FALSE

The & operator makes the final condition FALSE FALSE FALSE FALSE and hence no column is selected.

The last block
df %>% mutate_if(!is.factor(.) & (names(.) %in% varnames),
                 funs(mean(as.numeric(.)))) 

Replace all "." with df, you can find

  • !is.factor(df) returns TRUE
  • names(df) %in% varnames returns TRUE FALSE TRUE FALSE

The & operator makes the final condition TRUE FALSE TRUE FALSE and hence a and c are selected.

  • Related