Home > OS >  Removing excess classes for a whole dataframe
Removing excess classes for a whole dataframe

Time:11-09

I have data as follows, and a problem I regretfully don't seem to be able to reproduce:

dat <- structure(c(1, NA_real_), format.stata = "%8.0g", labels = c(female = 1, 
male = 2), class = c("haven_labelled", "vctrs_vctr", "double"
))

dat <- data.frame(dat)

lapply(dat, class)

[1] "haven_labelled" "vctrs_vctr"     "double"        

I would like to remove the custom labels and I tried a couple of the following things:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

dat <- clear.labels(dat)

However this does not work because the class is haven_labelled. Obviously I could change that, but I would rather have something that works independent of name.

lapply(dat, class)
$dat
[1] "haven_labelled" "vctrs_vctr"     "double"        

I also tried:

dat <- data.frame(lapply(dat, unclass))

lapply(dat, class)

$dat
[1] "numeric"

For my actual data however, it does not seems to work, even though it has exactly the same data.

Are there any other options I could try?

EDIT: Would it not be a possibility to simply make the last class the only class?

CodePudding user response:

Use haven’s zap_*() functions:

library(haven)

zapped <- dat |>
  zap_labels() |>
  zap_formats()

zapped
#   dat
# 1   1
# 2  NA

class(zapped$dat)
# "numeric"
  • Related