I'm just starting out with R and I'm practicing basic analysis with flight data. One of the columns in this df is the number of minutes a given flight was delayed. I tried finding the average using mean()
but it's returning the error:
"argument is not numeric or logical: returning NA"
Obviously, this means this data is stored as something other than numbers; however, I've used as.numeric
to change it to numbers (with no errors) and I used the str()
function to verify that it's numbers and yet mean()
is still not working. The weirdest part is that when I use the summary()
function, it gives me the mean of that column! So R must know that there are numbers there! What is going on??
Here's a snippet of the data:
Origin Dest DelayMinutes
chr chr dbl
JFK FLL 5
LAS LAX 1
CodePudding user response:
Probably this column has NA values, so when you calculate the mean with mean() function you need to pass the na.rm = TRUE argument
mean(x, na.rm = T)
CodePudding user response:
Your syntax is incorrect, as you are essentially trying to get the mean of a string (e.g.,, mean("string")
). So, you need to either get the mean of the column in base R or if you are wanting to use dplyr
, then you need to use mutate
.
So in base R, you could write like:
mean(delta_flights[,"DelayMinutes"], na.rm = T)
#[1] 3
# Or like this:
mean(delta_flights$DelayMinutes, na.rm = T)
#[1] 3
With dplyr
, we could write like this to pipe into mutate
:
library(dplyr)
delta_flights %>%
mutate(x = mean(DelayMinutes, na.rm = T))
Output
Origin Dest DelayMinutes x
1 JFK FLL 5 3
2 LAS LAX 1 3
So, now if you can see that the results are the same as what you get with summary()
:
summary(delta_flights)
# Origin Dest DelayMinutes
#Length:2 Length:2 Min. :1
#Class :character Class :character 1st Qu.:2
#Mode :character Mode :character Median :3
# Mean :3
# 3rd Qu.:4
# Max. :5