Home > Enterprise >  Removing Incorrect Labels within Tidyverse/ Limiting Actions of as_factor()
Removing Incorrect Labels within Tidyverse/ Limiting Actions of as_factor()

Time:10-10

I'm working with British Election Study data. To be used in R, this first has to be converted from the .dta form provided, which I think puts labels on to a lot of variables. Most of the time this is useful, but I think a problem I've got is where this isn't the case.

Using as_factor() blindly converts all variables with labels to factors. Is there a way to specify that only certain vectors are converted ? i.e


new_df <- data %>%
          as_factor(just_this_column)

Failing that, is there a good way to remove the labels of certain variables within a dataframe ? I've kooked at the sjlabelled package but this does something weird and converts the data from a dataframe:

example_data<- str(sjlabelled::remove_all_labels(example_data$generalElectionVoteW19))

The reason I'm trying to do all of this is to make a histogram of number of people voting for each party (the factor) at a certain age. In this dataset, the age variable has a label which is messing up the code.

Of course, I could just convert the factor to a numeric value at the end but this seems like a messy way of achieving things !

Here is the dput:


structure(list(ageW19 = structure(c(72, 52, 39, 75, 26, 56), label = "Age", format.stata = "%8.0g", labels = c(`Not Asked` = -9, 
Skipped = -8), class = c("haven_labelled", "vctrs_vctr", "double"
)), generalElectionVoteW19 = structure(c(1, 13, 3, 1, 2, 1), label = "General election vote intention (recalled vote in post-election waves)", format.stata = "@.0g", labels = c(`I would/did not vote` = 0, 
Conservative = 1, Labour = 2, `Liberal Democrat` = 3, `Scottish National Party (SNP)` = 4, 
`Plaid Cymru` = 5, `United Kingdom Independence Party (UKIP)` = 6, 
`Green Party` = 7, `British National Party (BNP)` = 8, Other = 9, 
`Change UK- The Independent Group` = 11, `Brexit Party` = 12, 
`An independent candidate` = 13, `Don't know` = 9999), class = c("haven_labelled", 
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = c(`1` = 1L, `3` = 3L, `5` = 5L
))

CodePudding user response:

To your first questions, you need mutate to convert a single column, e.g.

new_df  <- data %>%
  mutate(factor_column = as_factor(old column))

However, as you said you probably want to convert to numeric type, so you might want to use as.numeric instead of as_factor.

CodePudding user response:

We may use base R

data$factor_column <- factor(data$old_column)
  • Related