I am trying to select all those households where the occupants age group are below 15 and between 30 to 40 from the below data frame. Basically I am looking for households with kids along with their parents
household Members_age
100 75
100 74
100 30
101 20
101 50
101 60
102 35
102 40
102 5
Here only Household 102 satisfy the condition.
I got hint over here subset data based on condition in r but not able to add the 2nd condition to filter i.e. age between 30 - 40
Sample dataset:
library(dplyr)
library(sqldf)
data <- data.frame(household = c(100,100,100,101,101,101,102,102,102),
Members_age = c(75,74,30,20,50,60,35,40,5))
CodePudding user response:
We may use
subset(data, Members_age < 15 | (Members_age >30 & Members_age < 40))
-output
household Members_age
7 102 35
9 102 5
According to the OP's post Here only Household 102 satisfy the condition.
CodePudding user response:
An alternative using dplyr
library(dplyr)
data %>%
filter(Members_age < 15 | (Members_age > 30 & Members_age < 40))
CodePudding user response:
We can group_by
household and look for groups (households) that have both at least one age < 15 and at least one age between 30 and 40.
data %>%
group_by(household) %>%
filter(any(Members_age < 15) & any(between(Members_age, 30, 40))) %>%
ungroup() %>%
distinct(household)
# # A tibble: 1 x 1
# household
# <dbl>
# 1 102
CodePudding user response:
Using data.table
.
library(data.table)
# setDT(data)[which(Members_age < 15 | (Members_age > 30 & Members_age < 40))]
# household Members_age
# 1: 102 35
# 2: 102 5