Home > Net >  Subset data in r based on two different condition at a time
Subset data in r based on two different condition at a time

Time:09-28

I am trying to select all those households where the occupants age group are below 15 and between 30 to 40 from the below data frame. Basically I am looking for households with kids along with their parents

household Members_age
   100          75
   100          74
   100          30
   101          20
   101          50
   101          60
   102          35
   102          40
   102           5

Here only Household 102 satisfy the condition.

I got hint over here subset data based on condition in r but not able to add the 2nd condition to filter i.e. age between 30 - 40

Sample dataset:

library(dplyr)
library(sqldf)
data <- data.frame(household = c(100,100,100,101,101,101,102,102,102),
           Members_age = c(75,74,30,20,50,60,35,40,5))

CodePudding user response:

We may use

subset(data, Members_age < 15 | (Members_age >30 & Members_age < 40))

-output

 household Members_age
7       102          35
9       102           5

According to the OP's post Here only Household 102 satisfy the condition.

CodePudding user response:

An alternative using dplyr

library(dplyr)

data %>% 
  filter(Members_age < 15 | (Members_age > 30 & Members_age < 40))

CodePudding user response:

We can group_by household and look for groups (households) that have both at least one age < 15 and at least one age between 30 and 40.

data %>%
  group_by(household) %>%
  filter(any(Members_age < 15) & any(between(Members_age, 30, 40))) %>%
  ungroup() %>%
  distinct(household)
# # A tibble: 1 x 1
# household
# <dbl>
#   1       102

CodePudding user response:

Using data.table.

library(data.table)
# setDT(data)[which(Members_age < 15 | (Members_age > 30 & Members_age < 40))]
#    household Members_age
# 1:       102          35
# 2:       102           5
  •  Tags:  
  • r
  • Related