Home > OS >  Select rows based on conditions in R
Select rows based on conditions in R

Time:08-28

The data looks like this.

id<-c(rep(102,9),rep(103,5),rep(104,4))
status<-rep(c('single','relationship','relationship','single','single','single'),3)
age<-c(17:19,22,23,26,28,32,33,21:25,21:24)

data<-data.frame(id,status,age)

I would like to select only rows that are the OLDEST age observation row of single and YOUNGEST observation of relationship for each id, each time the status changes. (STEP 1)

The end product should look like this

newdata<-data[c(1,2,7,8,13,14,18),]

The next step is to change it into wide form, which I think I can do. (STEP 2)

long<-newdata %>% 
  group_by(id,status) %>% 
  mutate(dummy=row_number(),
         status2=paste0(status,dummy)) %>% 
  ungroup() %>%
  select(id,status2,age) %>%
  pivot_wider(names_from = status2,values_from = age)

What I am not sure how to do is to code for STEP 1. I can only think of using slice_min and slice_max but that really depends on the status and also can occur multiple times for the same id.

Also perhaps there's a more clever way to combine step 1 and 2.

CodePudding user response:

Here is one approach using rleid from data.table to group your "single" and "relationship" status together.

I added a filter to remove "relationship" status before the first "single" (based on desired output, this appeared to be true for ID 104).

Once grouped in this way, you can filter based on status and age. If there is concern about multiple results for a given status and age let me know.

library(tidyverse)
library(data.table)

data %>%
  group_by(id) %>%
  filter(cumsum(status == "single") > 0) %>%
  group_by(grp = rleid(status == "single"), .add = T) %>%
  filter(status == "single" & age == max(age) |
           status == "relationship" & age == min(age)) %>%
  group_by(id, status) %>%
  mutate(rn = row_number()) %>%
  pivot_wider(id_cols = id, names_from = c(status, rn), values_from = age)

Output

     id single_1 relationship_1 single_2 relationship_2
  <dbl>    <dbl>          <dbl>    <dbl>          <dbl>
1   102       17             18       28             32
2   103       24             25       NA             NA
3   104       24             NA       NA             NA

CodePudding user response:

With dplyr you can do

library(tidyverse)

data %>%  
  group_by(id, status) %>%  
  filter(case_when(
    status == "single" ~ age == max(age), 
    TRUE ~ age == min(age)
  )) %>%  
  pivot_wider(names_from = status, 
              values_from = age)

# A tibble: 3 x 3
# Groups:   id [3]
     id relationship single
  <dbl>        <dbl>  <dbl>
1   102           18     28
2   103           25     24
3   104           21     24
  •  Tags:  
  • r
  • Related