Home > Enterprise >  Applying factor labels based on a condition in R
Applying factor labels based on a condition in R

Time:09-23

I have a data set with a variable 'education' which is coded differently in each of the three countries included, for example:

Code Country 1 Country 2 Country 3
1 No education No education No education
2 Primary Primary Islamic education
3 Secondary Secondary Primary
4 NA NA Secondary

I need to apply factor levels, which are different for each country.

Below is my attempt, but it doesn't appear to work:

df <- data.frame(
  Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE), 
  Education_1 = sample(1:4)
)

df$Education <- 
  if(df$Country == "Country1") {
    factor(df$Education,
           levels = c(1:4),
           labels = c("No education", "Primary", "Secondary", "NA"))
  } else if (df$Country == "Country2") {
    factor(df$Education,
           levels = c(1:4),
           labels = c("No education", "Primary", "Secondary", "NA"))
  } else {
    factor(df$Education, 
           levels = c(1:4), 
           labels = c("No education", "Islamic education", "Primary", "Secondary")
    )
  }

Thanks

CodePudding user response:

Perhaps this helps? This takes the data from the table mapping countries with the education code and the education category and converts it to long format.

Then use a left join to the two column dataframe with countries and education codes.

You could use the resulting column with education type as a string or the codes could be recoded to be consistent.

library(dplyr)
library(tidyr)
library(stringr)


df <- data.frame(
  Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE), 
  Education_1 = sample(1:4))


df_ed <- structure(list(Code = 1:4, Country.1 = c("No education", "Primary", 
                                                      "Secondary", NA), Country.2 = c("No education", "Primary", "Secondary", 
                                                                                      NA), Country.3 = c("No education", "Islamic education", "Primary", 
                                                                                                         "Secondary")), class = "data.frame", row.names = c(NA, -4L)) 

df_levels  <-  
  df_ed %>% 
  pivot_longer(-Code) %>% 
  mutate(name = str_replace(name, "\\.", " "))

df1 <- 
  df %>% 
  left_join(df_levels, by = c("Country" = "name", "Education_1" = "Code"))

head(df1)
#>     Country Education_1        value
#> 1 Country 1           3    Secondary
#> 2 Country 2           4         <NA>
#> 3 Country 3           1 No education
#> 4 Country 1           2      Primary
#> 5 Country 3           3      Primary
#> 6 Country 2           4         <NA>

Created on 2021-09-22 by the reprex package (v2.0.0)

  •  Tags:  
  • r
  • Related