ID | score | Time |
---|---|---|
1 | 1000000 | 1 |
2 | 1000000 | 1 |
2 | 1000000 | 2 |
3 | 1000000 | 1 |
3 | 1000000 | 2 |
4 | 1000000 | 1 |
5 | 1000000 | 1 |
5 | 1000000 | 2 |
How do I subset this database to a new data base including only ID's with at least 2 follow-ups? Follow-up is resembled as "Time", whereas people with only Time 1 had one follow-up and I want to exclude them from the database. Thanks.
CodePudding user response:
We can either use a frequency based on approach to filter
the 'ID's having more than one observation after grouping by 'ID'
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(n() > 1) %>%
ungroup
Or in base R
- also use subset
to subset the rows where it checks for 'ID's that have Time
value greater than 1
subset(df1, ID %in% ID[Time > 1])
data
df1 <- structure(list(ID = c(1L, 2L, 2L, 3L, 3L, 4L, 5L, 5L), score = c(1000000L,
1000000L, 1000000L, 1000000L, 1000000L, 1000000L, 1000000L, 1000000L
), Time = c(1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-8L))
CodePudding user response:
Here is another option using dplyr
, where we first count
the number of observations for each ID
, then filter
to keep only the ID
s with at least 2 observations, then remove the count column.
library(dplyr)
df %>%
add_count(ID, name = "obs") %>%
filter(obs > 1) %>%
select(-obs)
Output
ID score Time
1 2 1000000 1
2 2 1000000 2
3 3 1000000 1
4 3 1000000 2
5 5 1000000 1
6 5 1000000 2
Or another option using data.table
:
library(data.table)
setDT(df)[,if(.N > 1) .SD, by=ID]
Data
df <- structure(list(ID = c(1L, 2L, 2L, 3L, 3L, 4L, 5L, 5L), score = c(1000000L,
1000000L, 1000000L, 1000000L, 1000000L, 1000000L, 1000000L, 1000000L
), Time = c(1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-8L))