I have a large data frame, ExprsData, with several numeric and NA values. It looks something like this:
Patient | Gene_A | Gene_C | Gene_D |
---|---|---|---|
patient1 | 12 | 16 | NA |
patient2 | 15 | NA | 20 |
My data frame has 15 rows and 14 columns.
I have made a function that is meant to scale and center the values in my data frame:
MyScale <- function (x, scale, center){
removena <- x[!is.na(x)] #remove the NA values
meanofdata <- mean(removena) #calc the mean
stdofdata <- sd(removena) #calc the std
if (scale==TRUE){ #if scale is true
calcvec <- (removena - meanofdata)/stdofdata
return(calcvec)
}else if (center ==TRUE){ #if vec is true
centervec <- removena - meanofdata
return(centervec)
}
}
I tested out my function by running a a single column of my data frame like this:
MyScale (ExprsData$Gene_C, scale = TRUE, center = TRUE)
It works great!
Next, I want to be able to apply my function to my entire data frame, have it output as a data frame, assign it to an object and then save as a csv.
To do this I tried this:
ExprsDataScaled <- as.data.frame(lapply(ExprsData, function(x) MyScale(x = x, scale = TRUE, center = TRUE)))
write.csv(ExprsDataScaled,"?path//filename.csv", row.names = TRUE)
However, when I try to apply my function to my entire data frame, I get the following error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 14, 15
I understand that I am getting this error message because my columns differ in length. I know this is because in my function, I have it remove the NA values. I need to do this because otherwise I run into a lot of errors when I try to scale and center later in the function.
Is there a way to make a data frame with unequal columns? Is there a way to re-insert "NA" back into my data frame once it has been scaled and centered to avoid this error? Or a way to insert blank cells in some columns so they can all be the same length?
CodePudding user response:
This is a better version of you function that does not remove any NA from your data:
(However, the function will still trip on non numeric values for x
, or in cases where scale
and center
are both FALSE. But one could ask oneself why a scale function needs a scale yes or no parameter??)
MyScale <- function (x, scale, center){
meanofdata <- mean(x, na.rm = TRUE)
stdofdata <- sd(x, na.rm = TRUE)
if (scale==TRUE){
calcvec <- (x - meanofdata)/stdofdata
return(calcvec)
}else if (center ==TRUE){
centervec <- x - meanofdata
return(centervec)
}
}