Home > Software design >  Calculate how many times 3 events occur simultaneously
Calculate how many times 3 events occur simultaneously

Time:08-24

I have a dataset composed only of variables whose value is 1 or 0. 1 means the presence of a certain event, while 0 means the absence of it.

df <- data.frame(event1 = c(1, 0, 0, 1, 0, 0, 1),
                    event2 = c(1, 1, 0, 1, 0, 0, 1),
                    event3 = c(1, 0, 0, 0, 0, 0, 0),
                    event4 = c(0, 1, 1, 0, 1, 0, 0),
                    event5 = c(0, 1, 0, 1, 0, 1, 1),
                    event6 = c(1, 0, 0, 0, 0, 0, 0))

I would like to have a matrix which gives me the correlation between 3 events, that is, when three variables for the same record has the value equal to 1.

In the sample dataset I have above I should have event1, event2 and event 3 associated 1 time (first record), event2, event4 and event5 associated 1 time (fourth record) and so on.

I also would like to know how to extend this solution to more than just 3 events. I previously asked how to do this with only two events. I put the link here in case it can help find a solution.

CodePudding user response:

data.frame(nms = combn(names(df), 3, toString),
           count = colSums(combn(df, 3, rowSums) == 3))

                      nms count
1  event1, event2, event3     1
2  event1, event2, event4     0
3  event1, event2, event5     2
4  event1, event2, event6     1
5  event1, event3, event4     0
6  event1, event3, event5     0
7  event1, event3, event6     1
8  event1, event4, event5     0
9  event1, event4, event6     0
10 event1, event5, event6     0
11 event2, event3, event4     0
12 event2, event3, event5     0
13 event2, event3, event6     1
14 event2, event4, event5     1
15 event2, event4, event6     0
16 event2, event5, event6     0
17 event3, event4, event5     0
18 event3, event4, event6     0
19 event3, event5, event6     0
20 event4, event5, event6     0

CodePudding user response:

I'm not sure what type of output you are looking for, but here I use combn() to get combinations of columns. For each combination of columns, I find out how many rows have three events:

do.call(
  rbind,
  lapply(combn(colnames(df),3, simplify = F), \(i) {
    data.frame(events= paste0(i,collapse=","), ct = sum(rowSums(df[,i])==3))
  })
)

Output:

                 events ct
1  event1,event2,event3  1
2  event1,event2,event4  0
3  event1,event2,event5  2
4  event1,event2,event6  1
5  event1,event3,event4  0
6  event1,event3,event5  0
7  event1,event3,event6  1
8  event1,event4,event5  0
9  event1,event4,event6  0
10 event1,event5,event6  0
11 event2,event3,event4  0
12 event2,event3,event5  0
13 event2,event3,event6  1
14 event2,event4,event5  1
15 event2,event4,event6  0
16 event2,event5,event6  0
17 event3,event4,event5  0
18 event3,event4,event6  0
19 event3,event5,event6  0
20 event4,event5,event6  0
  • Related