Home > Net >  How to combine member of one column and then count other columns in R?
How to combine member of one column and then count other columns in R?

Time:05-02

I have a data frame:

df <- structure(list(ID = c("x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3", "x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3"), col1=c("a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2"), col2 = c("a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f"), col3 = c(2,13,1,21,0,5,3,0,6,4,50,0,0,0,0,9,5,0,51,3,6,0,0,9,89,4,29,1,4,17,6,16,9,1,0,0)), 
class = "data.frame", row.names = c(NA,-36L))
ID col1 col2 col3
x1  a1  a   2
x1  a1  b   13
x1  a1  c   1
x1  a1  d   21
x1  a1  e   0
x1  a1  f   5
x2  a1  a   3
x2  a1  b   0
x2  a1  c   6
x2  a1  d   4
x2  a1  e   50
x2  a1  f   0
x3  a1  a   0
x3  a1  b   0
x3  a1  c   0
x3  a1  d   9
x3  a1  e   5
x3  a1  f   0
x1  a2  a   51
x1  a2  b   3
x1  a2  c   6
x1  a2  d   0
x1  a2  e   0
x1  a2  f   9
x2  a2  a   89
x2  a2  b   4
x2  a2  c   29
x2  a2  d   1
x2  a2  e   4
x2  a2  f   17
x3  a2  a   6
x3  a2  b   16
x3  a2  c   9
x3  a2  d   1
x3  a2  e   0
x3  a2  f   0

I want to count the unique IDs that have "a", "b" or "c" >0 (more than zero), then "d" or "e" >0, and finally "f">0. Then get the sum of all (abc), (de) and (f) separately in a different column. So the result would look like the following:

df2<- structure(list(col1=c("a1","a1","a1","a2","a2","a2"), col2 = c("abc", "de", "f", "abc", "de", "f"), count.ID = c(2,3,1,3,2,2), total=c(25,89,5,213,6,26)),
                class = "data.frame", row.names = c(NA,-6L))
col1 col2 count.ID total
a1  abc 2   25
a1  de  3   89
a1  f   1   5
a2  abc 3   213
a2  de  2   6
a2  f   2   26

How is this possible in R? Thanks

CodePudding user response:

One premise would be to create a frame of grouping variables mapping old col2 to the new combined col2, and then merge/join it to the original data.

dplyr

library(dplyr)
groups <- data.frame(col2=c("a","b","c","d","e","f"), col2b=c("abc","abc","abc","de","de","f"))
left_join(df, groups, by = "col2") %>%
  group_by(col1, col2 = col2b) %>%
  summarize(count.ID = length(unique(ID[col3 > 0])), total = sum(col3)) %>%
  ungroup()
# # A tibble: 6 x 4
#   col1  col2  count.ID total
#   <chr> <chr>    <int> <dbl>
# 1 a1    abc          2    25
# 2 a1    de           3    89
# 3 a1    f            1     5
# 4 a2    abc          3   213
# 5 a2    de           2     6
# 6 a2    f            2    26
  • Related