Home > database >  Summarise and group_by not working with factor variables
Summarise and group_by not working with factor variables

Time:10-03

I'm currently using the tidyverse package version 1.3.1, and when I run the following code:

data <- data.frame(gender = c(1,2,1,2,2,2,2,1,2,1), age = c(18,20,21,24,25,24,24,25,22,21))

data <- data%>%
  mutate(gender = factor(gender, levels = c("male", "female")))

data%>%
  group_by(gender)%>%
  summarise(mean = mean(age))

I get these results

   # A tibble: 1 × 2
  gender  mean
  <fct>  <dbl>
1 NA      22.4

CodePudding user response:

Yes, you should change the labels and not levels.

library(dplyr)

data%>%
  mutate(gender = factor(gender, labels = c("male", "female"))) %>%
  group_by(gender)%>%
  summarise(mean = mean(age))

#  gender  mean
#  <fct>  <dbl>
#1 male    21.2
#2 female  23.2

CodePudding user response:

We don't need to convert to factor for recoding. It can be directly done by using the 'gender' (numeric variable) as index for replacing the values

library(dplyr)
data %>%
    group_by(gender = c("male", "female")[gender]) %>%
    summarise(mean = mean(age, na.rm = TRUE))

-output

# A tibble: 2 × 2
  gender  mean
  <chr>  <dbl>
1 female  23.2
2 male    21.2

Or using fct_recode

library(forcats)
data %>%
   group_by(gender = fct_recode(as.character(gender), male = "1",
         female = "2")) %>% 
   summarise(mean = mean(age, na.rm = TRUE))
# A tibble: 2 × 2
  gender  mean
  <fct>  <dbl>
1 male    21.2
2 female  23.2
  • Related