Home > OS >  Is there a way to calculate the Z score for all values in a row in a data frame?
Is there a way to calculate the Z score for all values in a row in a data frame?

Time:11-18

I have a data frame which contains expression levels of a gene in 1677 conditions. I am looking to create a new data frame which has the Z score for each condition. This is the code I have so far:

for (cell_no in 1:ncol(NANOG_data)) {
  z_score[cell_no] <- (NANOG_data[2, cell_no] - rowMeans(NANOG_data)) / rowSds(as.matrix(NANOG_data))}

And this is what the data frame looks like.

When I run this code, I get this error:

Error: object 'z_score' not found.

Is there a way to more easily populate a new data frame using a for loop, or is there a vectorized function I can run on my original data frame to calculate the Z score for each value?

CodePudding user response:

As @GuedesBF commented, posting a screenshot of data is bad practise, and you should avoid that (ref https://xkcd.com/2116/).

I will try to help you with a dummy dataset:

#let's first generate a matrix
set.seed(999)
my_dummy_data <- matrix(rnorm(length(letters)), nrow=1, dimnames=list(1,letters))

>my_dummy_data 
           a        b        c         d          e          f         g
1 -0.2817402 -1.31256 0.795184 0.2700705 -0.2773064 -0.5660237 -1.878658
          h          i         j        k         l         m         n
1 -1.266791 -0.9677497 -1.121009 1.325464 0.1339774 0.9387494 0.1725381
          o         p          q         r         s         t         u
1 0.9576504 -1.362686 0.06833513 0.1006576 0.9013448 -2.074357 -1.228563
          v          w         x         y         z
1 0.6430443 -0.3597629 0.2940356 -1.125268 0.6422657

As far as I understand, this is the same structure as your data: column names are genes (e.g. "AAACCCTG..."), and the numerical values are "expressions". (not a geneticist, so apologies if I get the terminology wrong :)).

Now, I assume that you want to generate a new vector where the expression values are transformed into z-scores by subtracting the mean and dividing by standard error. That can be done by:

my_z_scores <-( my_dummy_data-mean(my_dummy_data) ) / sd(my_dummy_data)

Going beyond your actual question, before doing any further analysis, you might want to transform your data into a columnar form:

my_better_dummy_data <- data.frame(gene=colnames(my_dummy_data), expression=as.vector(my_dummy_data) )

In columnar form, the z-scores could be calculated as

my_better_dummy_data$z_score <- (my_better_dummy_data$expression - mean(my_better_dummy_data$expression) / sd(my_better_dummy_data$expression)
  • Related