I'm working with British Election Study data. To be used in R, this first has to be converted from the .dta form provided, which I think puts labels on to a lot of variables. Most of the time this is useful, but I think a problem I've got is where this isn't the case.
Using as_factor()
blindly converts all variables with labels to factors. Is there a way to specify that only certain vectors are converted ? i.e
new_df <- data %>%
as_factor(just_this_column)
Failing that, is there a good way to remove the labels of certain variables within a dataframe ? I've kooked at the sjlabelled
package but this does something weird and converts the data from a dataframe:
example_data<- str(sjlabelled::remove_all_labels(example_data$generalElectionVoteW19))
The reason I'm trying to do all of this is to make a histogram of number of people voting for each party (the factor) at a certain age. In this dataset, the age variable has a label which is messing up the code.
Of course, I could just convert the factor to a numeric value at the end but this seems like a messy way of achieving things !
Here is the dput:
structure(list(ageW19 = structure(c(72, 52, 39, 75, 26, 56), label = "Age", format.stata = "%8.0g", labels = c(`Not Asked` = -9,
Skipped = -8), class = c("haven_labelled", "vctrs_vctr", "double"
)), generalElectionVoteW19 = structure(c(1, 13, 3, 1, 2, 1), label = "General election vote intention (recalled vote in post-election waves)", format.stata = "@.0g", labels = c(`I would/did not vote` = 0,
Conservative = 1, Labour = 2, `Liberal Democrat` = 3, `Scottish National Party (SNP)` = 4,
`Plaid Cymru` = 5, `United Kingdom Independence Party (UKIP)` = 6,
`Green Party` = 7, `British National Party (BNP)` = 8, Other = 9,
`Change UK- The Independent Group` = 11, `Brexit Party` = 12,
`An independent candidate` = 13, `Don't know` = 9999), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"), na.action = c(`1` = 1L, `3` = 3L, `5` = 5L
))
CodePudding user response:
To your first questions, you need mutate
to convert a single column, e.g.
new_df <- data %>%
mutate(factor_column = as_factor(old column))
However, as you said you probably want to convert to numeric type, so you might want to use as.numeric
instead of as_factor
.
CodePudding user response:
We may use base R
data$factor_column <- factor(data$old_column)