The data looks like this.
id<-c(rep(102,9),rep(103,5),rep(104,4))
status<-rep(c('single','relationship','relationship','single','single','single'),3)
age<-c(17:19,22,23,26,28,32,33,21:25,21:24)
data<-data.frame(id,status,age)
I would like to select only rows that are the OLDEST age observation row of single and YOUNGEST observation of relationship for each id, each time the status changes. (STEP 1)
The end product should look like this
newdata<-data[c(1,2,7,8,13,14,18),]
The next step is to change it into wide form, which I think I can do. (STEP 2)
long<-newdata %>%
group_by(id,status) %>%
mutate(dummy=row_number(),
status2=paste0(status,dummy)) %>%
ungroup() %>%
select(id,status2,age) %>%
pivot_wider(names_from = status2,values_from = age)
What I am not sure how to do is to code for STEP 1. I can only think of using slice_min and slice_max but that really depends on the status and also can occur multiple times for the same id.
Also perhaps there's a more clever way to combine step 1 and 2.
CodePudding user response:
Here is one approach using rleid
from data.table
to group your "single" and "relationship" status together.
I added a filter
to remove "relationship" status before the first "single" (based on desired output, this appeared to be true for ID 104).
Once grouped in this way, you can filter
based on status
and age
. If there is concern about multiple results for a given status
and age
let me know.
library(tidyverse)
library(data.table)
data %>%
group_by(id) %>%
filter(cumsum(status == "single") > 0) %>%
group_by(grp = rleid(status == "single"), .add = T) %>%
filter(status == "single" & age == max(age) |
status == "relationship" & age == min(age)) %>%
group_by(id, status) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = id, names_from = c(status, rn), values_from = age)
Output
id single_1 relationship_1 single_2 relationship_2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 102 17 18 28 32
2 103 24 25 NA NA
3 104 24 NA NA NA
CodePudding user response:
With dplyr
you can do
library(tidyverse)
data %>%
group_by(id, status) %>%
filter(case_when(
status == "single" ~ age == max(age),
TRUE ~ age == min(age)
)) %>%
pivot_wider(names_from = status,
values_from = age)
# A tibble: 3 x 3
# Groups: id [3]
id relationship single
<dbl> <dbl> <dbl>
1 102 18 28
2 103 25 24
3 104 21 24