Home > Software engineering >  How to filter a labelled tibble in R
How to filter a labelled tibble in R

Time:10-13

I want to filter df tibble using the variable s2 with values 2 and 3. I get a new tibble df2 that keeps showing also the value 1 of s2.

How can I create a new tibble with only the filtered values of df?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(labelled)

df <- tibble(s1 = c("M", "M", "F", "M", "M", "F"),
             s2 = c(1, 1, 2, 1, 1, 3)) %>% 
  set_variable_labels(s1 = "Sex", s2 = "Question") %>%
  set_value_labels(s1 = c(Male = "M", Female = "F"), s2 = c(Yes = 1, No = 2, DK =3))

df2 <- df %>% filter(s2 %in% c("2", "3"))

df2$s2
#> <labelled<double>[2]>: Question
#> [1] 2 3
#> 
#> Labels:
#>  value label
#>      1   Yes
#>      2    No
#>      3    DK
Created on 2022-10-12 with reprex v2.0.2

CodePudding user response:

Your new tibble df2 does only contain the filtered values. In your results it shows

#> [1] 2 3

which are the filtered results you want.

The extra detail printed are attributes associated with the data. They show the defined labels for s2 but the data you have filtered does not have every label here.

#> Labels:
#>  value label
#>      1   Yes
#>      2    No
#>      3    DK

Edit based on update from OP:

To filter the label attributes based on the filtered data, I think you need

df2 <- df %>% filter(s2 %in% c("2", "3")) %>% drop_unused_value_labels()

df2$s2
#> <labelled<double>[2]>: Question
#> [1] 2 3
#> 
#> Labels:
#>  value label
#>      2    No
#>      3    DK

  • Related