I'm struggling on how can I calculate the wear of a component using the lag
of a variable. However, I need to calculate the wear on different groups, so I'm using the group_by
function, but here's a problem, when I use the variable that I need to group, this results in a column of "NA's", but when I test by grouping one another variable that has fewer factors the calculation works.
The dataframe I'm using has 4093902
rows and 52
lines. The variable I need to group to perform my wear calculation has 90183
factors. The other one that I tested and it worked had 11321
factors.
Here's the code I'm using:
final_date = result_data %>%
arrange((time)) %>%
group_by(id_specific)%>%
mutate(wear = dplyr::lag(some_value, n = 1, default = NA) - some_value)
Does anyone know if there is a factor limit for grouping? Or any other tips on how I can perform this calculation?
CodePudding user response:
The NA
can be a result of either lag
which returns the first value by default as NA
or from the other column value which can also be NA
. Thus, when we do the -
(or any arithmetic) if there is any NA in the lhs or rhs, it returns NA
. One option is to make use of a function (rowSums
) that can use na.rm = TRUE
library(dplyr)
final_date <- result_data %>%
arrange((time)) %>%
group_by(id_specific)%>%
mutate(some_value_new = dplyr::lag(some_value, n = 1,
default = NA)) %>%
ungroup %>%
mutate(wear = rowSums(cbind(some_value_new, -1 * some_value),
na.rm = TRUE), some_value_new = NULL)
NOTE: It is also better to ungroup
before doing the rowSums
to get some efficiency