Home > other >  Problems using mean() in R
Problems using mean() in R

Time:01-11

I have some issues with the mean() function in R. I get this error when running my code:

argument is not numeric or logical: returning NA.

The function works here:

data %>% filter(Sex == "M") %>% summarise(mean(weight))

But does not work here:

data %>% filter(Sex == "M") %>% mean(weight)

This code does not work either:

data %>% mean(weight)

I would be grateful for any help. Thank you :)

CodePudding user response:

use magrittr

library(magrittr)
mtcars %$% mean(mpg)
#> [1] 20.09062

Created on 2023-01-11 with reprex v2.0.2

CodePudding user response:

If you just supply mean after the %>% pipe, you are basically throwing a data frame into a function that cannot handle it. As Richie pointed out, this is because mean is a function that expects a vector, not a data frame (see below). I show below how to do this in base R and dplyr so you can hopefully figure out the difference.

Base R Method

If you run ?mean, the first argument is "x", which has the following listed as what it does:

An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.

Since you did not supply data, I used R's iris dataset to show why this matters. To circumvent the issue above, the base R way of doing the same thing requires filtering a subset of the data, saving it as a new data frame, and then applying a mean function to the vector of iris$Sepal.Width. This way, R understands what you are doing because you are doing the heavy lifting before supplying a vector to the function.

#### Summarise by Assignment (Base R) ####
v.iris <- iris[iris$Species == "versicolor",]
mean(v.iris$Sepal.Width)

Giving you an unnamed version of the dplyr method:

[1] 2.77

DPLYR Method

Here I show you how to do it in dplyr in a cleaner way than you originally attempted. This way you can understand comparatively what the mean function is doing while also giving your tidier output. First, you can load dplyr:

#### Load Library ####
library(dplyr)

Here I did pretty much the same thing you did with the same iris data, but assigned it a variable called Mean.Width so it has cleaner naming. The "formula" is basically as follows: 1) take the data 2) pipe an entire dataset into the filter, which selects only the Species vector values that have "versicolor" 3) using this data, assign a variable called "Mean.Width" 4) apply mean to the vector iris$Sepal.Width pulled from the previous functions to get "Mean.Width."

#### Summarise by Assignment (DPLYR Method) ####
iris %>% 
  filter(Species == "versicolor") %>% 
  summarise(Mean.Width = mean(Sepal.Width))

Which gives you this:

 Mean.Width
1       2.77
  • Related