how to classify groups based on the individuals who compose them?-CodePudding

Here's my problem : I have a database of individuals (1 individual per row). Each individual belongs to a household (indicated by the variable ID_household) and is of a certain age (variable age). What I want to do is to create a new column type that defines the type of household based on the composition of individuals who form the same household :

If there are 2 adults (two people of more than 18 years old), the type variable takes the value "couple" ;
If there are 1 adult and at least 1 minor with a minimum age difference of 15 years = "single parent family" ;
If there are 2 adults and at least 1 minor with a minimum age difference of 15 years = "couple with children" ;
If there is a single person = "single person".

Here's the script to import the data. ID_household and age are the original columns. type is the column I want to create, but I don't know how to do :

data <- data.frame(ID_household = c(1, 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 11, 11),
           age = c(31, 29, 36, 24, 34, 42, 19, 39, 6, 9, 42, 4, 6, 29, 34, 41, 12, 51, 26, 27, 1, 3),
           type = c("couple", "couple", "single person", "couple", "couple", "single person", "single person",
                    "single parent family", "single parent family", "single parent family", "single person",
                    "couple with children", "couple with children", "couple with children", "couple with children", 
                    "single parent family", "single parent family", "single person", "couple with children",
                    "couple with children", "couple with children", "couple with children"))

data
   ID_household age                 type
1             1  31               couple
2             1  29               couple
3             2  36        single person
4             3  24               couple
5             3  34               couple
6             4  42        single person
7             5  19        single person
8             6  39 single parent family
9             6   6 single parent family
10            6   9 single parent family
11            7  42        single person
12            8   4 couple with children
13            8   6 couple with children
14            8  29 couple with children
15            8  34 couple with children
16            9  41 single parent family
17            9  12 single parent family
18           10  51        single person
19           11  26 couple with children
20           11  27 couple with children
21           11   1 couple with children
22           11   3 couple with children

CodePudding user response：

I would do it by creating the variables regarding kids, adults and age-differences and using case_when(). In the code below, I am making type2 to compare with the type variable in your dataset:

data <- data %>% 
  group_by(ID_household) %>% 
  mutate(n_adult = sum(age > 18), 
         n_kids = sum(age <= 18),
         min_adult_age  = min(age[which(age > 18)]), 
         max_kid_age = ifelse(n_kids > 0, max(age[which(age <= 18)]), 0),  
         age_diff = min_adult_age - max_kid_age, 
         type2 = case_when(
            n_adult == 2 & n_kids > 0 & age_diff >= 15 ~ "couple with children", 
            n_adult == 1 & n_kids > 0 & age_diff >= 15 ~ "single parent family", 
            n_adult == 2 & n_kids == 0 ~ "couple",
            n_adult == 1 & n_kids == 0 ~ "single person", 
            TRUE ~ NA_character_)) %>% 
  select(-(n_adult:age_diff))

all(data$type == data$type2)           
#[1] TRUE

CodePudding user response：

Here is a base R way with ave.

type <- with(data, ave(age, ID_household, FUN = \(x){
  if(length(x) < 2) {
    "single person"
  } else if(length(x) == 2L && all(x >= 18)) {
    "couple"
  } else if(sum(x >= 18) == 1){
    "single parent family"
  } else "couple with children"
}))

identical(data$type, type)
#[1] TRUE