I have the following data frame.
id <- c(1,1,1,2,2,2,3,3,3,4,4,4)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
event1 <- c(0,0,1,0,0,1,0,0,0,1,0,0)
event2 <- c(0,1,0,0,0,0,0,1,0,1,0,0)
event3 <- c(0,0,0,0,0,0,0,1,0,1,0,0)
temp<-data.frame(id,time,event1,event2,event3)
id time event1 event2 event3
1 1 1 0 0 0
2 1 2 0 1 0
3 1 3 1 0 0
4 2 1 0 0 0
5 2 2 0 0 0
6 2 3 1 0 0
7 3 1 0 0 0
8 3 2 0 1 1
9 3 3 0 0 0
10 4 1 1 1 1
11 4 2 0 0 0
12 4 3 0 0 0
I wish to get a subset that each unique id and the event1, event2, event3 columns if that event happened (event = 1)
So that the output would be like this:
id event1 event2 event3
1 1 1 1 0
2 2 1 0 0
3 3 0 1 1
4 4 1 1 1
I tried to remove duplicates and keep the unique id but that gave me the wrong output since event1 through event3 might not happened in time = 1. The time doesn't matter. As long as the event happens in one of the three time it counts.
I was trying to replace the 0s with 1 by id if 1 exists in that id for that event but this didn't work.
temp %>% group_by(id) %>% mutate(id, event1=ifelse(event1==1,1,event1),
event2 = ifelse(event2==1,1,event2),
event3 =ifelse(event3 ==1,1,event3))
Any help would be appreciated.
library(tidyverse)
id <- c(1,1,1,2,2,2,3,3,3,4,4,4)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
event1 <- as.factor(c(0,0,1,0,0,1,0,0,0,1,0,0))
event2 <-as.factor(c(0,1,0,0,0,0,0,1,0,1,0,0))
event3 <- as.factor(c(0,0,0,0,0,0,0,1,0,1,0,0))
temp<-data.frame(id,time,event1,event2,event3)
temp[,3:5][temp[,3:5]==0] <-NA
temp2 <- as.data.frame(temp%>% group_by(id)%>%
fill(event1,event2,event3,.direction ="downup")
)
temp3 <- temp2[!duplicated(temp2[,'id']),]
temp3[is.na(temp3)] <-0
temp3
This gave my desired result but I feel like it's overcomplicated.
CodePudding user response:
You can use summarise()
across()
in dplyr
:
library(dplyr)
temp %>%
group_by(id) %>%
summarise(across(contains("event"), sum))
The method above calculates the count of each event. If you just want to know whether an event happened or not, you can replace the sum
part with max
:
temp %>%
group_by(id) %>%
summarise(across(contains("event"), max))
# A tibble: 4 x 4
id event1 event2 event3
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 0
2 2 1 0 0
3 3 0 1 1
4 4 1 1 1
Other choices
# 1
across(contains("event"), ~ any(.x == 1))
# 2
across(contains("event"), ~ (sum(.x) > 0))
(Tip:
converts logical to binary integer)
CodePudding user response:
Here is a possible data.table
option:
library(data.table)
dt <- as.data.table(temp)
dt[, lapply(.SD, max, na.rm = TRUE), by=.(id), .SDcols=patterns("event")]
Or can use aggregate
from base R:
aggregate(cbind(event1, event2, event3) ~ id, data = temp, max, na.rm = TRUE)
Or the collapse
package:
library(collapse)
collap(temp, event1 event2 event3 ~ id, fmax)
Output
id event1 event2 event3
1: 1 1 1 0
2: 2 1 0 0
3: 3 0 1 1
4: 4 1 1 1