Home > Mobile >  R: Using rnorm() ignoring NAs
R: Using rnorm() ignoring NAs

Time:02-22

In R, I am trying to generate a column of normally distributed random values based on columns existing in the dataframe. As there are NAs in the columns I am using, NAs are returned in the new column. Is there a way I can ignore these NAs?

I've used the built-in "airquality" data set as a dummy example to illustrate my problem as it includes NAs.

Example Code:

> airquality$random <- rnorm(n = nrow(airquality), mean = (50   airquality$Ozone*1.2   airquality$Solar.R*0.5   airquality$Wind*3   airquality$Temp*0.2), sd = 5)
Warning message:
In rnorm(n = nrow(airquality), mean = (50   airquality$Ozone * 1.2    :
  NAs produced
> head(airquality)
  Ozone Solar.R Wind Temp Month Day   random
1    41     190  7.4   67     5   1 222.5487
2    36     118  8.0   72     5   2 191.8911
3    12     149 12.6   74     5   3 188.5913
4    18     313 11.5   62     5   4 273.7623
5    NA      NA 14.3   56     5   5      NaN
6    28      NA 14.9   66     5   6      NaN

Rows 5 and 6 have NaN listed in the "random" column. I would like the "NA" values to be ignored (NAs could be in any column), so that row 5 would have a value of 100.22697*, and row 6 would have a value of 143.1274*

*these values were obtained by manually removing the NA columns from the code, but this isn't practical if I have a data set of thousands of rows and tens of columns.

Including na.rm = TRUE within the mean = () section returns an error message:

> airquality$random <- rnorm(n = nrow(airquality), mean = (50   airquality$Ozone*1.2   airquality$Solar.R*0.5   airquality$Wind*3   airquality$Temp*0.2, na.rm = TRUE), sd = 5)
Error: unexpected ',' in "airquality$random <- rnorm(n = nrow(airquality), mean = (50   airquality$Ozone*1.2   airquality$Solar.R*0.5   airquality$Wind*3   airquality$Temp*0.2,"

If I put na.rm = TRUE at the end of the rnorm() section instead, it returns a different error message:

> airquality$random <- rnorm(n = nrow(airquality), mean = (50   airquality$Ozone*1.2   airquality$Solar.R*0.5   airquality$Wind*3   airquality$Temp*0.2), sd = 5, na.rm = TRUE)
Error in rnorm(n = nrow(airquality), mean = (50   airquality$Ozone * 1.2    :
  unused argument (na.rm = TRUE)

CodePudding user response:

There are NA elements in the columns. An option is to convert the NA to 0 and then then use that in the calculation for the mean value as any arithmetic operation ( ) with NA returns NA. Also, the na.rm = TRUE in OP's code is doing nothing as it is an argument for the function mean and not for the parameter mean in rnorm

tmp <- replace(airquality, is.na(airquality), 0)
rnorm(n = nrow(tmp), mean = (50   tmp$Ozone*1.2   
     tmp$Solar.R*0.5   tmp$Wind*3   tmp$Temp*0.2), sd = 5)

-output

[1] 230.54274 175.63684 191.09352 272.75306 108.56648 147.00571 263.74854 176.54437 138.34712 193.08597  91.91993 244.38106 246.75853 254.24543
 [15] 151.91976 278.17564 291.79558 166.25150 291.45414 124.37602  91.79503 285.40843 107.71572 180.18917 133.19906 230.15677  85.62659 135.79484
 [29] 290.33483 328.01940 268.59444 233.47771 240.75695 231.80327 182.09191 204.27378 239.15409 197.93795 222.74681 338.83640 305.89129 226.42353
 [43] 221.24226 190.21195 269.85521 260.70751 223.62116 319.93789 135.87652 171.82266 174.49938 159.01576 102.26742 128.54807 211.20653 147.96291
 [57] 147.09487 118.30181 137.67034 128.21919 155.11839 380.05157 276.10527 247.92156 142.26157 249.46309 308.70804 299.31816 344.92304 341.46871
 [71] 276.41410 162.72329 255.91151 224.55619 253.61126 139.57793 284.49149 277.79043 313.16401 273.82248 284.77515 109.29104 219.85529 249.98464
 [85] 331.40262 334.25997 155.60046 209.86921 308.24376 283.52677 287.17854 289.04658 173.97213 135.25243 148.04647 180.20284 135.46867 159.87413
 [99] 354.59991 317.06474 324.99885 207.30217 176.03234 243.66570 262.85251 251.59681 127.67214 161.26074 174.00305 177.45175 247.33476 246.72993
[113] 262.35549 136.78726 226.31141 259.75610 397.83067 293.34740 156.55135 289.65580 322.11876 301.68719 280.98863 297.88105 276.24682 251.60182
[127] 290.87347 188.69579 200.89883 246.30607 242.00302 232.00063 250.95042 279.32326 268.79634 235.51977 130.42808 170.68919 258.11289 241.38551
[141] 122.56174 239.76333 206.85573 230.98431 120.80214 214.62313 126.46720 135.36000 213.49774 178.97910 216.34848 175.17351 231.16300

Another option is to multiply the columns separately with the weights and then get the sum with rowSums where we can use na.rm = TRUE

mn <- rowSums(transform(airquality, Ozone = Ozone * 1.2,
   Solar.R = Solar.R * 0.5, Wind = Wind * 0.3, 
 Temp = Temp * 0.2)[c("Ozone", "Solar.R", "Wind", "Temp")], na.rm = TRUE)
rnorm(n = nrow(airquality), mean = mn, sd = 5)
  •  Tags:  
  • r na
  • Related