I have a spreadsheet where some of the values are entered as "N/A", and some of the cells are blank.
joe | pete | mark | Average ⬇️ (the average per row) | |
---|---|---|---|---|
90 | 85 | N/A | 87.5 | |
N/A | 92 | 92 | ||
88 | 90 | 89 | ||
3 | 2 | 2 | 3 | <-- This row Counts all non-blank values in each column |
I want to import these into R to do two things:
- Get an average of these values for each row across multiple columns and
- get a count of the values per column (see below for example)
The problem is: I want to be able to count all the non-blank cells (including those with "N/A" values, as they are actually important part of the data and are different from blanks
What I tried: Replaced the "N/A" values in Excel before I imported into R by changing the "n/a"s to 0's, so I can import the columns as numbers, but the problem is, then my averages are messed up. If I add a 0 to the first row, for example, then my average is 58.33 (90 85 0)/3 = 58.33 That is not what I want. I want an average of only those that are not "N/A".
The other issue I have, is that if I leave those as N/A, then I can get a count, but my columns are not numeric anymore and I can't perform an average calculaiton.
I know I can do this easily in excel with =COUNTA and =AVERAGE, but I would prefer to do as much wrangling as possible in R. Any suggesitons??
Thanks!!
CodePudding user response:
try something like this. The na.rm=TRUE
should be what you want:
example_data = c(90, 85, NA)
MEAN = mean(example_data, na.rm=TRUE)
CodePudding user response:
base R
dat$avg <- mapply(function(...) {
dots <- unlist(list(...))
mean(suppressWarnings(as.numeric(dots[nzchar(dots)])),
na.rm = TRUE)
}, dat$joe, dat$pete, dat$mark)
dat
# joe pete mark avg
# 1 90 85 <NA> 87.5
# 2 NA 92 92.0
# 3 88 90 89.0
as.data.frame(lapply(dat, function(z) sum(nzchar(z))))
# joe pete mark avg
# 1 3 2 2 3
dat <- rbind(dat, as.data.frame(lapply(dat, function(z) sum(nzchar(z)))))
dat
# joe pete mark avg
# 1 90 85 <NA> 87.5
# 2 NA 92 92.0
# 3 88 90 89.0
# 4 3 2 2 3.0
Data
dat <- structure(list(joe = c(90, NA, 88), pete = c("85", "", "90"), mark = c(NA, "92", "")), class = "data.frame", row.names = c(NA, -3L))