I have some issues with the mean()
function in R. I get this error when running my code:
argument is not numeric or logical: returning NA.
The function works here:
data %>% filter(Sex == "M") %>% summarise(mean(weight))
But does not work here:
data %>% filter(Sex == "M") %>% mean(weight)
This code does not work either:
data %>% mean(weight)
I would be grateful for any help. Thank you :)
CodePudding user response:
use magrittr
library(magrittr)
mtcars %$% mean(mpg)
#> [1] 20.09062
Created on 2023-01-11 with reprex v2.0.2
CodePudding user response:
If you just supply mean
after the %>%
pipe, you are basically throwing a data frame into a function that cannot handle it. As Richie pointed out, this is because mean
is a function that expects a vector, not a data frame (see below). I show below how to do this in base R and dplyr so you can hopefully figure out the difference.
Base R Method
If you run ?mean
, the first argument is "x", which has the following listed as what it does:
An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.
Since you did not supply data, I used R's iris
dataset to show why this matters. To circumvent the issue above, the base R way of doing the same thing requires filtering a subset of the data, saving it as a new data frame, and then applying a mean function to the vector of iris$Sepal.Width
. This way, R understands what you are doing because you are doing the heavy lifting before supplying a vector to the function.
#### Summarise by Assignment (Base R) ####
v.iris <- iris[iris$Species == "versicolor",]
mean(v.iris$Sepal.Width)
Giving you an unnamed version of the dplyr method:
[1] 2.77
DPLYR Method
Here I show you how to do it in dplyr
in a cleaner way than you originally attempted. This way you can understand comparatively what the mean function is doing while also giving your tidier output. First, you can load dplyr
:
#### Load Library ####
library(dplyr)
Here I did pretty much the same thing you did with the same iris
data, but assigned it a variable called Mean.Width so it has cleaner naming. The "formula" is basically as follows: 1) take the data 2) pipe an entire dataset into the filter, which selects only the Species vector values that have "versicolor" 3) using this data, assign a variable called "Mean.Width" 4) apply mean
to the vector iris$Sepal.Width
pulled from the previous functions to get "Mean.Width."
#### Summarise by Assignment (DPLYR Method) ####
iris %>%
filter(Species == "versicolor") %>%
summarise(Mean.Width = mean(Sepal.Width))
Which gives you this:
Mean.Width
1 2.77