Home > Net >  Creating a variable from other two categorical variables R
Creating a variable from other two categorical variables R

Time:08-21

I looked for similar questions/answers on StackOverflow, but I could not find the answer I am looking for.

I have two categorical variables -

  1. Region of education
  2. Educational residence

Both variables (region of education and educational residence) are in one dataset after merging two different datasets.

  1. Region of education has variables such as - Western, Non-western, Unknown
  2. Educational residence has only two variables - In the USA and Out of the USA

Now, I have to make a new categorical variable with recoding options - Western as Education from Western countries, Non-Western as Education from Non-Western countries, Unknown as Unknown, while from the educational residence I have to use only in the USA.

So in end, the new variable will have four options -

Education from western countries, Education from non-western countries, Education from the USA, Unknown

Is there any idea, of how to do this?

I apologise as I cannot post the data output due to ethical and legal issues.

I will be very thankful for any kind of help.

CodePudding user response:

You don't need to share real data to get a pointer to the correct answer, just a simple reproducible example should suffice. According to your description, the relevant columns in your data frame should look something like this reproducible example:

set.seed(1)
df <- data.frame(Region = sample(c("Western", "Non-Western", "Unknown"), 10, T),
                 Residence = sample(c("USA", "Non-USA"), 10, T))

df
#>         Region Residence
#> 1      Western       USA
#> 2      Unknown       USA
#> 3      Western       USA
#> 4  Non-Western       USA
#> 5      Western   Non-USA
#> 6      Unknown   Non-USA
#> 7      Unknown   Non-USA
#> 8  Non-Western   Non-USA
#> 9  Non-Western       USA
#> 10     Unknown       USA

We can smoosh these columns together using ifelse. Where the Residence column is "USA", the output will be "USA", and otherwise it will retain the "Western", "Non-Western" and "Unknown" levels from the Region column:

df$Education <- ifelse(df$Residence == "USA", "USA", df$Region)

df
#>         Region Residence   Education
#> 1      Western       USA         USA
#> 2      Unknown       USA         USA
#> 3      Western       USA         USA
#> 4  Non-Western       USA         USA
#> 5      Western   Non-USA     Western
#> 6      Unknown   Non-USA     Unknown
#> 7      Unknown   Non-USA     Unknown
#> 8  Non-Western   Non-USA Non-Western
#> 9  Non-Western       USA         USA
#> 10     Unknown       USA         USA

Created on 2022-08-20 with reprex v2.0.2

  •  Tags:  
  • r
  • Related