I have a dataset with mesh opening measurements and the tools used to get those measurements. I want to complete a one-way anova on the data. Here's my code:
df<-structure(list(MeasurementTool = c("Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge"),
MeshOpening = c(157L, 155L, 160L, 160L, 161L, 160L, 158L, 161L, 162L, 162L, 160L, 163L,
158L, 160L, 161L, 165L, 164L, 158L, 164L, 163L, 159L, 158L, 165L,
164L, 159L, 160L, 158L, 159L, 160L, 163L, 159L, 160L, 158L, 158L,
158L, 162L, 160L, 159L, 159L, 159L, 159L, 159L, 159L, 155L, 156L,
156L, 158L, 160L, 156L, 155L, 160L, 160L, 157L, 159L, 158L, 155L,
158L, 157L, 156L, 158L)), row.names = c(NA, -60L), class = "data.frame")
df$`MeasurementTool`<- as.factor(df$`MeasurementTool`)
group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean('MeshOpening', na.rm = TRUE), sd = sd('MeshOpening', na.rm = TRUE))
It is giving me these warning messages:
Warning messages:
1: In mean.default("MeshOpening", na.rm = TRUE) : argument is not numeric or logical: returning NA
2: In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : NAs introduced by coercion
CodePudding user response:
You are getting tripped up by the way dplyr::summarise
works. It's expecting an R name
(a.k.a. symbol
), i.e. no quotes around the letters:
group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean(MeshOpening, na.rm = TRUE), sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 1 × 4
`"MeasurementTool"` count mean sd
<chr> <int> <dbl> <dbl>
1 MeasurementTool 60 159. 2.48
In the pre-tidyverse days we would often refer to columns by their character-valued names as you did, but many people seem to like thinking of column names as first class objects as is now the norm in the tidyverse.
Even better would be to solve not only the cause of the error but also to get what you really wanted:
group_by(df, MeasurementTool) %>% summarise(count = n(),
mean = mean(MeshOpening, na.rm = TRUE),
sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 3 × 4
MeasurementTool count mean sd
<fct> <int> <dbl> <dbl>
1 ICES Gauge 20 158. 1.73
2 Wedge 20 161. 2.56
3 Weighted Wedge 20 160. 2.06
Arguably the group_by function ought to throw an error or at least a warning if the value of its second argument is not going to be interpreted to a value that matches a column name.