Group_by not working, summarize() computing identical values?-CodePudding

I am using the data found here: https://www.kaggle.com/cdc/behavioral-risk-factor-surveillance-system. In my R studio, I have named the csv file, BRFSS2015. Below is the code I am trying to execute. I have created two new columns comparing people who have arthritis vs. people who do not have arthritis (arth and no_arth). Grouping by these variables, I am now trying to find the mean and sd for their weights. The weight variable was generated from another variable in the dataset using this code: (weight = BRFSS2015$WEIGHT2) Below is the code I am trying to run for mean and sd.

BRFSS2015%>%
  group_by(arth,no_arth)%>%
  summarize(mean_weight=mean(weight),
            sd_weight=sd(weight))

I am getting output that says mean and sd for these two groups is identical. I doubt this is correct. Can someone check and tell me why this is happening? The numbers I am getting are:

arth: mean = 733.2044; sd= 2197.377 no_arth: mean= 733.2044; sd= 2197.377

Here is how I created the variables arth and no_arth:

a=BRFSS2015%>%
  select(HAVARTH3)%>%
  filter(HAVARTH3=="1")
b=BRFSS2015%>%
  select(HAVARTH3)%>%
  filter(HAVARTH3=="2")

as.data.frame(BRFSS2015)
arth=c(a)
no_arth=c(b)
BRFSS2015$arth <- c(arth, rep(NA, nrow(BRFSS2015)-length(arth)))
BRFSS2015$no_arth <- c(no_arth, rep(NA, nrow(BRFSS2015)-length(no_arth)))
as.tibble(BRFSS2015)

Before I started, I also removed NAs from weight using weight=na.omit(WEIGHT2)

CodePudding user response：

Based on the info you provided one can only guess what when wrong in your analysis. But here is a working code using a snippet of the real data.

library(tidyverse)

BRFSS2015_minimal <- structure(list(HAVARTH3 = c(
  1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2,
  1, 1, 1, 1, 1, 1, 2, 1, 2
), WEIGHT2 = c(
  280, 165, 158, 180, 142,
  145, 148, 179, 84, 161, 175, 150, 9999, 140, 170, 128, 200, 178,
  155, 163
)), row.names = c(NA, -20L), class = c(
  "tbl_df", "tbl",
  "data.frame"
))

BRFSS2015_minimal %>%
  filter(!is.na(WEIGHT2), HAVARTH3 %in% 1:2) %>%
  mutate(arth = HAVARTH3 == 1, no_arth = HAVARTH3 == 2,weight = WEIGHT2) %>%
  group_by(arth, no_arth) %>%
  summarize(
    mean_weight = mean(weight),
    sd_weight = sd(weight),
    .groups = "drop"
  )
#> # A tibble: 2 × 4
#>   arth  no_arth mean_weight sd_weight
#>   <lgl> <lgl>         <dbl>     <dbl>
#> 1 FALSE TRUE            165      10.8
#> 2 TRUE  FALSE           865    2629.

Code used to create dataset

BRFSS2015 <- readr::read_csv("2015.csv")
 
BRFSS2015_minimal <- dput(head(BRFSS2015[c("HAVARTH3", "WEIGHT2")], 20))