Counting observations and considering condition-CodePudding

I have a database like this:

id <- c(rep(1,3), rep(2, 3), rep(3, 3))
condition <- c(0, 0, 1, 0, 0, 1, 1, 1, 0)
time_point1 <- c(1, 1, NA)
time_point2 <- c(NA, 1, NA)
time_point3 <- c(NA, NA, NA)
time_point4 <- c(1, NA, NA, 1, NA, NA, NA, NA, 1)

data <- data.frame(id, condition, time_point1, time_point2, time_point3, time_point4)
data

  id condition time_point1 time_point2 time_point3 time_point4
1  1         0           1          NA          NA           1
2  1         0           1           1          NA          NA
3  1         1          NA          NA          NA          NA
4  2         0           1          NA          NA           1
5  2         0           1           1          NA          NA
6  2         1          NA          NA          NA          NA
7  3         1           1          NA          NA          NA
8  3         1           1           1          NA          NA
9  3         0          NA          NA          NA           1

I want to make a table with how many have the condition == 1 (n_x) and also how many are in each time point (n_t). In case there is none also I want a 0. I tried this:

data %>% 
  pivot_longer(cols = contains("time_point")) %>% 
  filter (!is.na(value)) %>% 
  group_by(name) %>% 
  mutate(n_t = n_distinct(id)) %>% 
  ungroup() %>% 
  filter(condition == 1) %>%
  group_by(name) %>%
  summarise(n_x = n_distinct(id), n_t = first(n_t))

Obtaining this:

  name          n_x   n_t
  <chr>       <int> <int>
1 time_point1     1     3
2 time_point2     1     3

Desired Outcome: I want this type of table that considers the cases with condition and without it:

         name n_x n_t
1 time_point1   2   6
2 time_point2   1   3
3 time_point3   0   0
4 time_point4   0   3

Thank you!

CodePudding user response：

You can pivot_longer() to be able to group_by() time_points and then summarise just adding up the values. For conditions only sum values where the column values != NA.

data %>% 
  pivot_longer(cols=c(3:6),names_to = 'point', values_to='values') %>%
  group_by(point) %>% 
  summarise(n_x = sum(condition[!is.na(values)]), n_t = sum(values, na.rm = TRUE))

Output:

# A tibble: 4 x 3
  point         n_x   n_t
  <chr>       <dbl> <dbl>
1 time_point1     2     6
2 time_point2     1     3
3 time_point3     0     0
4 time_point4     0     3