Home > Software design >  R - case_when() creating a new variable based on conditions of multiple existing variables
R - case_when() creating a new variable based on conditions of multiple existing variables

Time:07-08

I am struggling to create a new variable named "edu_category" to indicate whether each person experiences Female Hypergamy (wive's education level < husband's), Female Homogamy (wive's education level == husband's), or Female Hypogamy (wive's education level > husband's).

My data looks like this (Female == 1 indicates this person is female, 0 indicates male):

PersonID Female EducationLevel SpouseID SpouseEducation
101 1 3 102 4
102 0 4 101 3
103 1 2 104 2
104 0 2 103 2
105 0 5 106 6
106 1 6 105 5

I wish to create a new variable so that my data looks like this:

PersonID Female EducationLevel SpouseID SpouseEducation edu_category
101 1 3 102 4 FHypergamy
102 0 4 101 3 FHypergamy
103 1 2 104 2 FHomogamy
104 0 2 103 2 FHomogamy
105 0 5 106 6 FHypogamy
106 1 6 105 5 FHypogamy

Here, let's look at person with ID "105", his (because female == 0) education level is 5, his spouse's (person 106's) education level is 6, so it's Female Hypogamy, wive's education > husband's (we assume by default everyone's spouse is of opposite sex).

Now let's look at person with ID "106", since she is person 105's spouse, we also fill the variable "edu_category" with the same "FHypogamy". So essentially, we are looking at every unit of couples.

What I tried:

df2 <- df1 %>%
  mutate(edu_category = case_when((SpouseEducation > EducationLevel) | (Female == 1) ~ 'FemaleHypergamy',
                                   (SpouseEducation == EducationLevel) | (Female == 1) ~ 'FemaleHomogamy',
                                   (SpouseEducation < EducationLevel) | (Female == 1) ~ 'FemaleHypogamy',
                                   (SpouseEducation > EducationLevel) | (Female == 0) ~ 'FemaleHypogamy',
                                   (SpouseEducation == EducationLevel) | (Female == 0) ~ 'FemaleHomogamy',
                                   (SpouseEducation < EducationLevel) | (Female == 0) ~ 'FemaleHypergamy'))

However, it's not giving my accurate results - the variable "edu_category" itself is successfully created, but the "FemaleHypergamy", "FemaleHomogamy", and "FemaleHypogamy" are not reflecting accurate situations.

What should I do? Thank you for the help!

CodePudding user response:

One way could be using the conditions and then fill the created NA's:

library(dplyr)
library(tidyr)

df %>% 
  mutate(edu_category = case_when(Female == 0 & EducationLevel < SpouseEducation ~ "FHypogamy",
                                  Female == 0 & EducationLevel == SpouseEducation ~ "Homogamy",
                                  Female == 0 & EducationLevel > SpouseEducation ~ "Hypergamy", 
                                  TRUE ~ NA_character_)) %>% 
  fill(edu_category, .direction = "updown")
  PersonID Female EducationLevel SpouseID SpouseEducation edu_category
1      101      1              3      102               4    Hypergamy
2      102      0              4      101               3    Hypergamy
3      103      1              2      104               2     Homogamy
4      104      0              2      103               2     Homogamy
5      105      0              5      106               6    FHypogamy
6      106      1              6      105               5    FHypogamy

CodePudding user response:

df2 <- df1 %>%
  mutate(edu_category = case_when(
    (SpouseEducation > EducationLevel & Female == 1) ~ 'FemaleHypergamy',
    (SpouseEducation > EducationLevel & Female == 0) ~ 'FemaleHypogamy',
    (SpouseEducation < EducationLevel & Female == 1) ~ 'FemaleHypogamy',
    (SpouseEducation < EducationLevel & Female == 0) ~ 'FemaleHypergamy',
  SpouseEducation == EducationLevel  ~ 'FemaleHomogamy'))
  • Related