really confused why this is not working:
df <- data.frame(a = c("1", "2", "3"),
b = c(2, 3, 4),
c = c(4, 3, 2),
d = c("1", "5", "9"))
varnames = c("a", "c")
df %>%
mutate_if((is.character(.) & names(.) %in% varnames),
funs(mean(as.numeric(.))))
a b c d
1 1 2 4 1
2 2 3 3 5
3 3 4 2 9
Expected output would be
a b c d
1 2 2 4 1
2 2 3 3 5
3 2 4 2 9
It works with a single condition, but the class condition I've actually only gotten to work using this formulation (which I don't know how to combine with the column name condition):
df %>%
mutate_if(function(col) is.character(col),
funs(mean(as.numeric(.))))
a b c d
1 2 2 4 5
2 2 3 3 5
3 2 4 2 5
However is.factor
seems to work fine with the column names?
df %>%
mutate_if(!is.factor(.) & (names(.) %in% varnames),
funs(mean(as.numeric(.))))
a b c d
1 2 2 3 1
2 2 3 3 5
3 2 4 3 9
CodePudding user response:
Note that mutate_if
is being phased out in favour of across
, so the following is perhaps what you want...
df %>%
mutate(across(where(is.character) & matches(varnames), ~mean(as.numeric(.))))
a b c d
1 2 2 4 1
2 2 3 3 5
3 2 4 2 9
CodePudding user response:
mutate_if()
doesn't work like you do. In its help page, it says that the second argument to set the conditions need to be one of the following two cases:
- A predicate function to be applied to the columns. (In this case, it can be a normal function or a lambda function, i.e. the form of
~ fun(.)
) - A logical vector.
If you want to calculate means for character columns, the correct syntax is
Code 1:
df %>% mutate_if(~ is.character(.), funs(mean(as.numeric(.))))
instead of
df %>% mutate_if(is.character(.), funs(mean(as.numeric(.))))
which results in an significant error. Then, let's talk about the following code:
Code 2:
df %>% mutate_if(names(.) %in% varnames, funs(mean(as.numeric(.))))
Theoretically, mutate_if
only extract column values, not column names, so ~ names(.)
should make no sense in it. But why does Code 2 work fine without the ~
symbol in front of names(.)
? The reason is that the "."
in names
actually represents df
per se instead of each column from df
because of the feature of the pipe operator (%>%
). Therefore, Code 2 is actually executed equivalently as
df %>% mutate_if(names(df) %in% varnames,
funs(mean(as.numeric(.))))
where a logical vector is passed to it rather than a predicate function. names(df) %in% varnames
returns TRUE FALSE TRUE FALSE
and hence a
and c
are selected. This can explain why your first block fails but the last one works.
The first block
df %>% mutate_if(is.character(.) & names(.) %in% varnames,
funs(mean(as.numeric(.))))
Replace all "."
with df
, you can find
is.character(df)
returnsFALSE
names(df) %in% varnames
returnsTRUE FALSE TRUE FALSE
The &
operator makes the final condition FALSE FALSE FALSE FALSE
and hence no column is selected.
The last block
df %>% mutate_if(!is.factor(.) & (names(.) %in% varnames),
funs(mean(as.numeric(.))))
Replace all "."
with df
, you can find
!is.factor(df)
returnsTRUE
names(df) %in% varnames
returnsTRUE FALSE TRUE FALSE
The &
operator makes the final condition TRUE FALSE TRUE FALSE
and hence a
and c
are selected.