In R, excel COUNTA equivalent, or another way to calculate averages in "char" columns?-CodePudding

I have a spreadsheet where some of the values are entered as "N/A", and some of the cells are blank.

joe	pete	mark	Average ⬇️ (the average per row)
90	85	N/A	87.5
N/A		92	92
88	90		89
3	2	2	3	<-- This row Counts all non-blank values in each column

I want to import these into R to do two things:

Get an average of these values for each row across multiple columns and
get a count of the values per column (see below for example)

The problem is: I want to be able to count all the non-blank cells (including those with "N/A" values, as they are actually important part of the data and are different from blanks

What I tried: Replaced the "N/A" values in Excel before I imported into R by changing the "n/a"s to 0's, so I can import the columns as numbers, but the problem is, then my averages are messed up. If I add a 0 to the first row, for example, then my average is 58.33 (90 85 0)/3 = 58.33 That is not what I want. I want an average of only those that are not "N/A".

The other issue I have, is that if I leave those as N/A, then I can get a count, but my columns are not numeric anymore and I can't perform an average calculaiton.

I know I can do this easily in excel with =COUNTA and =AVERAGE, but I would prefer to do as much wrangling as possible in R. Any suggesitons??

Thanks!!

CodePudding user response：

try something like this. The na.rm=TRUE should be what you want:

example_data = c(90, 85, NA)
MEAN = mean(example_data, na.rm=TRUE)

CodePudding user response：

base R

dat$avg <- mapply(function(...) {
  dots <- unlist(list(...))
  mean(suppressWarnings(as.numeric(dots[nzchar(dots)])),
       na.rm = TRUE)
}, dat$joe, dat$pete, dat$mark)
dat
#   joe pete mark  avg
# 1  90   85 <NA> 87.5
# 2  NA        92 92.0
# 3  88   90      89.0
as.data.frame(lapply(dat, function(z) sum(nzchar(z))))
#   joe pete mark avg
# 1   3    2    2   3
dat <- rbind(dat, as.data.frame(lapply(dat, function(z) sum(nzchar(z)))))
dat
#   joe pete mark  avg
# 1  90   85 <NA> 87.5
# 2  NA        92 92.0
# 3  88   90      89.0
# 4   3    2    2  3.0

Data

dat <- structure(list(joe = c(90, NA, 88), pete = c("85", "", "90"), mark = c(NA, "92", "")), class = "data.frame", row.names = c(NA, -3L))