Home > Software engineering >  Create a variable to count the number of unique values in each row for a subset of columns
Create a variable to count the number of unique values in each row for a subset of columns

Time:12-16

I'd like to create a variable to count the number of unique values in each row for a subset of columns (i.e.,baseline,wave1,wave2,wave3). So far I have the below. I have included an example data set with a variable "example" to show what I am after. I also have included the variable "change", which shows the variable created using the code below.

# Create example data
data <- structure(list(age = c("18", "19", NA, "40", "21", "33", "32", 
"34", "43", "22"), baseline = c("1", "1", NA, "4", "1", "3", 
"2", "4", "3", "2"), wave1 = c("1", "1", "2", "4", "4", "3", 
"2", "4", "3", "2"), wave2 = c("1", "1", "4", "4", NA, "3", 
"2", "4", "3", "2"), wave3 = c("1", "2", NA, "4", "4", "3", 
"2", "4", "3", "4"), example = c("1", "2", "2", "1", "2", "1", 
"1", "1", "1", "2"), change = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L)), row.names = c(NA, -10L), groups = structure(list(.rows = structure(list(
    1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))

library(dplyr)
# Create a var for change at any point (ignoring NAs)
data <- data %>% 
  rowwise() %>% #perform operation by row
  mutate(change = length(unique(na.omit(baseline,wave1,wave2,wave3))))

CodePudding user response:

We can use n_distinct where we can use the na.rm argument to remove the NA elements (though in the OP's data, it was "NA")

library(dplyr)
data %>%
   type.convert(as.is = TRUE) %>%
   rowwise %>% 
   mutate(change = n_distinct(c_across(baseline:wave3), na.rm = TRUE)) %>%
   ungroup

-output

# A tibble: 10 × 7
     age baseline wave1 wave2 wave3 example change
   <int>    <int> <int> <int> <int>   <int>  <int>
 1    18        1     1     1     1       1      1
 2    19        1     1     1     2       2      2
 3    NA       NA     2     4    NA       2      2
 4    40        4     4     4     4       1      1
 5    21        1     4    NA     4       2      2
 6    33        3     3     3     3       1      1
 7    32        2     2     2     2       1      1
 8    34        4     4     4     4       1      1
 9    43        3     3     3     3       1      1
10    22        2     2     2     4       2      2

Or a faster option with dapply from collapse

library(collapse)
data$change <- dapply(slt(ungroup(data), baseline:wave3), 
      MARGIN = 1, FUN = fndistinct)
  • Related