Home > Net >  In R, x and y are arrays (time series data), how to calculate cor() for trend correlation?
In R, x and y are arrays (time series data), how to calculate cor() for trend correlation?

Time:08-11

I want to see the correlation of two time series datasets.

df <- data.frame(
    row.names = paste0("s", 1:5),
    R_T1 = 1:5, R_T2 = 2:6, R_T3 = 3:7,
    P_T1 = 4:8, P_T2 = 5:9, P_T4 = 6:10
)

That looks like

   R_T1 R_T2 R_T3 P_T1 P_T2 P_T4
s1    1    2    3    4    5    6
s2    2    3    4    5    6    7
s3    3    4    5    6    7    8
s4    4    5    6    7    8    9
s5    5    6    7    8    9   10

or

       R   P
s1 T1  1   4    
s1 T2  2   5    
s1 T3  3   6  
s2 T1  2   5    
s2 T2  3   6    
s2 T3  4   7

s1-s5 are samples names; R and P are two variables, each variable has 3 observation results. What I want to calculate is cor(c(R_T1,R_T2,R_T3), c(P_T1,P_T2,P_T3)) for each sample. For example: for s1, cor(c(1,2,3), c(4,5,6)) but not cor(R_T1,P_T1), cor(R_T2,P_T2)...is the second table more clear? The purpose is to calculate the trend correlation of R and P.

How can I achieve this?

CodePudding user response:

First of all hi!

Initially, I wanted to tell you that this is not the proper way to present your data neither your question!

If I understood your data structure this should be like the following,

df<-structure(list(R_T1 = c(1, 2, 3, 4, 5),
               R_T2 = c(2, 3, 4, 5, 6),
               R_T3 = c(3, 4, 5, 6, 7),
               P_T1 = c(4, 5, 6, 7, 8),
               P_T2 = c(5, 6, 7, 8, 9),
               P_T3 = c(6, 7, 8, 9, 10)),
          row.names = c("s1", "s2", "s3", "s4", "s5"),
          class = "data.frame")

Then (again if I understood correctly), you want the correlation of R vs. P for each time point!

CorrelationT1 <- cor(df$R_T1,df$P_T1)
CorrelationT2 <- cor(df$R_T2,df$P_T2)
CorrelationT3 <- cor(df$R_T3,df$P_T3)

The problem here is that your data are highly correlated, so just to give you a more random data to check, please see bellow,

dfrnorm<-structure(list(R_T1 = rnorm(5),
               R_T2 = rnorm(5),
               R_T3 = rnorm(5),
               P_T1 = rnorm(5),
               P_T2 = rnorm(5),
               P_T3 = rnorm(5)),
          row.names = c("s1", "s2", "s3", "s4", "s5"),
          class = "data.frame")

With the respected correlations,

CorrelationT1 <- cor(dfrnorm$R_T1,dfrnorm$P_T1)
CorrelationT2 <- cor(dfrnorm$R_T2,dfrnorm$P_T2)
CorrelationT3 <- cor(dfrnorm$R_T3,dfrnorm$P_T3)

And a simple plot could be like this,

plot(1:3,
     c(CorrelationT1,CorrelationT2,CorrelationT3),
     xlab="Time Points", ylab="Correlation")

I hope this will help you,

Cheers

CodePudding user response:

1) The question indicates you want the correlation between R and P for each sample so we are looking for 5 correlations corresponding to the 5 samples. Create two data frames each of which has one column per sample with rows corresponding to time. Then use mapply to get the correlations of the first column of R with the first column of P, the second column of R with the second column of P, etc.

isR <- startsWith(names(df), "R")
R <- as.data.frame(t(df[isR]))
P <- as.data.frame(t(df[!isR]))
mapply(cor, R, P)
## s1 s2 s3 s4 s5 
##  1  1  1  1  1 

2) It could also be written like this:

spl <- split(as.data.frame(t(df)), startsWith(names(df), "R"))
do.call("mapply", c(cor, unname(spl)))
## s1 s2 s3 s4 s5 
##  1  1  1  1  1 

3) or using pipes:

df |>
  t() |>
  as.data.frame() |>
  list(. = _) |>
  with(split(., startsWith(rownames(.), "R"))) |>
  unname() |>
  c(FUN = cor) |>
  do.call(what = "mapply")
  • Related