I need to create multiple new data frames based on different filters that contain two variable counts “d” & “e” based of the values in columns “a”, “b” and “c”. I have created a function for this that works as long as at least one column has a value. However, sometimes certain groups will have no answer for a, b or c. I want both d and e to both return zero in the columns when this happens but instead it says “No data available in table”. I’ve added my code below.
f_calculate_net = function(data)
{ data %>% mutate(a = ifelse("a" %in% colnames(data), a, 0)) %>%
mutate(b = ifelse("b" %in% colnames(data), b, 0)) %>%
mutate(c = ifelse("c" %in% colnames(data), c, 0)) %>%
mutate(d = ifelse(a b c == 0, 0, ((a/(a b))*c) a)) %>%
mutate(e = ifelse(a b c == 0, 0, ((b/(a b))*c) b)) %>%
select(d,e) }
CodePudding user response:
Your main problem here is in the way you are using ifelse
. The expression "a" %in% colnames(data)
always returns a length-1 logical vector (either TRUE
or FALSE
). So the output of the expression ifelse("a" %in% colnames(data), a, 0)
will also be of length 1. It will return either the first element of a
or a single 0. Since this is inside a mutate
call, a
will either be overwritten by the first element of a
, or will be created as a column of zeros. Instead of ifelse
you should use
if(!"a" %in% colnames(data)) data$a <- 0
And the same for columns b
and c
.
You will sometimes still get a NaN
entry in columns d
and e
here if both a
and b
are 0, but c
isn't, since your expression ((a/(a b))*c) a)
divides by the sum of a
and b
. You should only check whether a b == 0
, since in that case you should return a 0
So the fixed function would be something like:
f_calculate_net = function(data) {
if(!"a" %in% colnames(data)) data$a <- 0
if(!"b" %in% colnames(data)) data$b <- 0
if(!"c" %in% colnames(data)) data$c <- 0
data %>%
mutate(d = ifelse(a b == 0, 0, ((a/(a b))*c) a)) %>%
mutate(e = ifelse(a b == 0, 0, ((b/(a b))*c) b)) %>%
select(d,e)
}
Let's create some random data to test this:
set.seed(123)
df <- data.frame(a = rpois(5, 1), b = rpois(5, 2), c = rpois(5, 1))
df
#> a b c
#> 1 0 0 3
#> 2 2 2 1
#> 3 1 4 1
#> 4 2 2 1
#> 5 3 2 0
And we see that we get the expected output:
f_calculate_net(df)
#> d e
#> 1 0.0 0.0
#> 2 2.5 2.5
#> 3 1.2 4.8
#> 4 2.5 2.5
#> 5 3.0 2.0
Created on 2022-08-15 by the reprex package (v2.0.1)
CodePudding user response:
When a and b are zero a/b is NAN. If you want this case to be zero, try change a b c == 0 to (a b) == 0
Based on Allan's explanation and comment, another possibility is to make a logical vector of the same length as the number of rows:
f_calculate_net = function(data)
{ data %>%
mutate(a = ifelse(rep("a" %in% colnames(data), nrow(data)), a, 0)) %>%
mutate(b = ifelse(rep("b" %in% colnames(data), nrow(data)), b, 0)) %>%
mutate(c = ifelse(rep("c" %in% colnames(data), nrow(data)), c, 0)) %>%
mutate(d = ifelse(a b == 0, 0, ((a/(a b))*c) a)) %>%
mutate(e = ifelse(a b == 0, 0, ((b/(a b))*c) b)) %>%
select(d,e) }