Here's my problem : I have a database of individuals (1 individual per row). Each individual belongs to a household (indicated by the variable ID_household
) and is of a certain age (variable age
). What I want to do is to create a new column type
that defines the type of household based on the composition of individuals who form the same household :
- If there are 2 adults (two people of more than 18 years old), the type variable takes the value "couple" ;
- If there are 1 adult and at least 1 minor with a minimum age difference of 15 years = "single parent family" ;
- If there are 2 adults and at least 1 minor with a minimum age difference of 15 years = "couple with children" ;
- If there is a single person = "single person".
Here's the script to import the data.
ID_household
and age
are the original columns. type
is the column I want to create, but I don't know how to do :
data <- data.frame(ID_household = c(1, 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 11, 11),
age = c(31, 29, 36, 24, 34, 42, 19, 39, 6, 9, 42, 4, 6, 29, 34, 41, 12, 51, 26, 27, 1, 3),
type = c("couple", "couple", "single person", "couple", "couple", "single person", "single person",
"single parent family", "single parent family", "single parent family", "single person",
"couple with children", "couple with children", "couple with children", "couple with children",
"single parent family", "single parent family", "single person", "couple with children",
"couple with children", "couple with children", "couple with children"))
data
ID_household age type
1 1 31 couple
2 1 29 couple
3 2 36 single person
4 3 24 couple
5 3 34 couple
6 4 42 single person
7 5 19 single person
8 6 39 single parent family
9 6 6 single parent family
10 6 9 single parent family
11 7 42 single person
12 8 4 couple with children
13 8 6 couple with children
14 8 29 couple with children
15 8 34 couple with children
16 9 41 single parent family
17 9 12 single parent family
18 10 51 single person
19 11 26 couple with children
20 11 27 couple with children
21 11 1 couple with children
22 11 3 couple with children
CodePudding user response:
I would do it by creating the variables regarding kids, adults and age-differences and using case_when()
. In the code below, I am making type2
to compare with the type
variable in your dataset:
data <- data %>%
group_by(ID_household) %>%
mutate(n_adult = sum(age > 18),
n_kids = sum(age <= 18),
min_adult_age = min(age[which(age > 18)]),
max_kid_age = ifelse(n_kids > 0, max(age[which(age <= 18)]), 0),
age_diff = min_adult_age - max_kid_age,
type2 = case_when(
n_adult == 2 & n_kids > 0 & age_diff >= 15 ~ "couple with children",
n_adult == 1 & n_kids > 0 & age_diff >= 15 ~ "single parent family",
n_adult == 2 & n_kids == 0 ~ "couple",
n_adult == 1 & n_kids == 0 ~ "single person",
TRUE ~ NA_character_)) %>%
select(-(n_adult:age_diff))
all(data$type == data$type2)
#[1] TRUE
CodePudding user response:
Here is a base R way with ave
.
type <- with(data, ave(age, ID_household, FUN = \(x){
if(length(x) < 2) {
"single person"
} else if(length(x) == 2L && all(x >= 18)) {
"couple"
} else if(sum(x >= 18) == 1){
"single parent family"
} else "couple with children"
}))
identical(data$type, type)
#[1] TRUE