back again with another question. I ended up tearing my code down from earlier and started from square one because I got to that I was buried in errors and this seemed like an easier approach to fix it. My function now returns correct values but only when I have it look at one file (ID). Whenever I attempt to run this over a sequence of files (i.e. 1:10) I get an incorrect answer and Warning message: In dat[, "ID"] == ID : longer object length is not a multiple of shorter object length
This is the code (I had originally tried using lapply and sapply with data.table, but that seemed to have opened a whole new can of worms I was not prepared for).
pollutantmean <- function(directory, pollutant, id = 1:332){
files_list <- list.files(directory, full.names = TRUE)
dat <- data.frame()
for(i in 1:332) {
dat <- rbind(dat, read.csv(files_list[i]))
}
dat_subset <- dat[which(dat[,"ID"]==id), ]
mean(dat_subset[ , pollutant], na.rm =TRUE, useNames = TRUE)
}
When I call my function as
> pollutantmean("./specdata/", "nitrate", 23)
it comes back with 1.280833 which is what I am expecting to see for this call.
However, when I call it as
pollutantmean("./specdata/", "sulfate", 1:10)
it comes back with the previously mentioned warning message
I don't know if it has something to do with the way I am defining the columns or rows of the dat <-data.frame()
or something else that's maybe staring me right in the face.
CodePudding user response:
replace ==
by %in%
. this allows for comparisons against vectors instead of scalars
# which is not needed
dat_subset <- dat[dat[,"ID"] %in% id, ]