I have a data set with a variable 'education' which is coded differently in each of the three countries included, for example:
Code | Country 1 | Country 2 | Country 3 |
---|---|---|---|
1 | No education | No education | No education |
2 | Primary | Primary | Islamic education |
3 | Secondary | Secondary | Primary |
4 | NA | NA | Secondary |
I need to apply factor levels, which are different for each country.
Below is my attempt, but it doesn't appear to work:
df <- data.frame(
Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE),
Education_1 = sample(1:4)
)
df$Education <-
if(df$Country == "Country1") {
factor(df$Education,
levels = c(1:4),
labels = c("No education", "Primary", "Secondary", "NA"))
} else if (df$Country == "Country2") {
factor(df$Education,
levels = c(1:4),
labels = c("No education", "Primary", "Secondary", "NA"))
} else {
factor(df$Education,
levels = c(1:4),
labels = c("No education", "Islamic education", "Primary", "Secondary")
)
}
Thanks
CodePudding user response:
Perhaps this helps? This takes the data from the table mapping countries with the education code and the education category and converts it to long format.
Then use a left join to the two column dataframe with countries and education codes.
You could use the resulting column with education type as a string or the codes could be recoded to be consistent.
library(dplyr)
library(tidyr)
library(stringr)
df <- data.frame(
Country = sample(c("Country 1", "Country 2", "Country 3"), 100, replace = TRUE),
Education_1 = sample(1:4))
df_ed <- structure(list(Code = 1:4, Country.1 = c("No education", "Primary",
"Secondary", NA), Country.2 = c("No education", "Primary", "Secondary",
NA), Country.3 = c("No education", "Islamic education", "Primary",
"Secondary")), class = "data.frame", row.names = c(NA, -4L))
df_levels <-
df_ed %>%
pivot_longer(-Code) %>%
mutate(name = str_replace(name, "\\.", " "))
df1 <-
df %>%
left_join(df_levels, by = c("Country" = "name", "Education_1" = "Code"))
head(df1)
#> Country Education_1 value
#> 1 Country 1 3 Secondary
#> 2 Country 2 4 <NA>
#> 3 Country 3 1 No education
#> 4 Country 1 2 Primary
#> 5 Country 3 3 Primary
#> 6 Country 2 4 <NA>
Created on 2021-09-22 by the reprex package (v2.0.0)