Applying factor labels based on a condition in R-CodePudding

I have a data set with a variable 'education' which is coded differently in each of the three countries included, for example:

Code	Country 1	Country 2	Country 3
1	No education	No education	No education
2	Primary	Primary	Islamic education
3	Secondary	Secondary	Primary
4	NA	NA	Secondary

I need to apply factor levels, which are different for each country.

Below is my attempt, but it doesn't appear to work:

df <- data.frame(
  Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE), 
  Education_1 = sample(1:4)
)

df$Education <- 
  if(df$Country == "Country1") {
    factor(df$Education,
           levels = c(1:4),
           labels = c("No education", "Primary", "Secondary", "NA"))
  } else if (df$Country == "Country2") {
    factor(df$Education,
           levels = c(1:4),
           labels = c("No education", "Primary", "Secondary", "NA"))
  } else {
    factor(df$Education, 
           levels = c(1:4), 
           labels = c("No education", "Islamic education", "Primary", "Secondary")
    )
  }

Thanks

CodePudding user response：

Perhaps this helps? This takes the data from the table mapping countries with the education code and the education category and converts it to long format.

Then use a left join to the two column dataframe with countries and education codes.

You could use the resulting column with education type as a string or the codes could be recoded to be consistent.

library(dplyr)
library(tidyr)
library(stringr)


df <- data.frame(
  Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE), 
  Education_1 = sample(1:4))


df_ed <- structure(list(Code = 1:4, Country.1 = c("No education", "Primary", 
                                                      "Secondary", NA), Country.2 = c("No education", "Primary", "Secondary", 
                                                                                      NA), Country.3 = c("No education", "Islamic education", "Primary", 
                                                                                                         "Secondary")), class = "data.frame", row.names = c(NA, -4L)) 

df_levels  <-  
  df_ed %>% 
  pivot_longer(-Code) %>% 
  mutate(name = str_replace(name, "\\.", " "))

df1 <- 
  df %>% 
  left_join(df_levels, by = c("Country" = "name", "Education_1" = "Code"))

head(df1)
#>     Country Education_1        value
#> 1 Country 1           3    Secondary
#> 2 Country 2           4         <NA>
#> 3 Country 3           1 No education
#> 4 Country 1           2      Primary
#> 5 Country 3           3      Primary
#> 6 Country 2           4         <NA>

^{Created on 2021-09-22 by the reprex package (v2.0.0)}