Home > Enterprise >  R max function gives NA
R max function gives NA

Time:10-24

I have the following data:

x=as.data.frame(c(NA,NA,5,20,25,50,75,8,NA))

Why does max(x) produce "NA" as result? NA cannot be the maximum. What is the logic behind this behavior?

In the end, only max(x, na.rm = TRUE) gives the correct maximum, but I wonder why. Thanks a lot in advance!

CodePudding user response:

  1. The logic behind missing values management in R language is that returning missing values (NA in R) for computations such as max, min, mean, sd, etc. allows the missing values to be taken into account in the computation: these NA values could not be used in the computation otherwise (see point 2 of my answer). Similarly,nchar will returns NA when computed with a "NA" string, since NA is different of an empty character string (""). In addition, this behavior allows the analyst to understand that the computation involves missing values, if he has not checked that or forget the na.rm = T argument.

  2. Other languages do not consider missing values in this kind of operations. For example in SQL you would get the effective highest number with max() and the lowest with min() - in your example the result would be 75 and 5 respectively, but missing values (NULL in SQL) will never be included in these results: in SQL, a NULL value will never be identified as the minimum or maximum. One could say that missing values are 'nowhere' in this SQL-case; unlike the handling of missing values in R, where they are the result.

  • Related