I want to see the correlation of two time series datasets.
df <- data.frame(
row.names = paste0("s", 1:5),
R_T1 = 1:5, R_T2 = 2:6, R_T3 = 3:7,
P_T1 = 4:8, P_T2 = 5:9, P_T4 = 6:10
)
That looks like
R_T1 R_T2 R_T3 P_T1 P_T2 P_T4
s1 1 2 3 4 5 6
s2 2 3 4 5 6 7
s3 3 4 5 6 7 8
s4 4 5 6 7 8 9
s5 5 6 7 8 9 10
or
R P
s1 T1 1 4
s1 T2 2 5
s1 T3 3 6
s2 T1 2 5
s2 T2 3 6
s2 T3 4 7
s1-s5 are samples names; R and P are two variables, each variable has 3 observation results.
What I want to calculate is cor(c(R_T1,R_T2,R_T3), c(P_T1,P_T2,P_T3))
for each sample.
For example: for s1, cor(c(1,2,3), c(4,5,6))
but not cor(R_T1,P_T1)
, cor(R_T2,P_T2)
...is the second table more clear?
The purpose is to calculate the trend correlation of R and P.
How can I achieve this?
CodePudding user response:
First of all hi!
Initially, I wanted to tell you that this is not the proper way to present your data neither your question!
If I understood your data structure this should be like the following,
df<-structure(list(R_T1 = c(1, 2, 3, 4, 5),
R_T2 = c(2, 3, 4, 5, 6),
R_T3 = c(3, 4, 5, 6, 7),
P_T1 = c(4, 5, 6, 7, 8),
P_T2 = c(5, 6, 7, 8, 9),
P_T3 = c(6, 7, 8, 9, 10)),
row.names = c("s1", "s2", "s3", "s4", "s5"),
class = "data.frame")
Then (again if I understood correctly), you want the correlation of R vs. P for each time point!
CorrelationT1 <- cor(df$R_T1,df$P_T1)
CorrelationT2 <- cor(df$R_T2,df$P_T2)
CorrelationT3 <- cor(df$R_T3,df$P_T3)
The problem here is that your data are highly correlated, so just to give you a more random data to check, please see bellow,
dfrnorm<-structure(list(R_T1 = rnorm(5),
R_T2 = rnorm(5),
R_T3 = rnorm(5),
P_T1 = rnorm(5),
P_T2 = rnorm(5),
P_T3 = rnorm(5)),
row.names = c("s1", "s2", "s3", "s4", "s5"),
class = "data.frame")
With the respected correlations,
CorrelationT1 <- cor(dfrnorm$R_T1,dfrnorm$P_T1)
CorrelationT2 <- cor(dfrnorm$R_T2,dfrnorm$P_T2)
CorrelationT3 <- cor(dfrnorm$R_T3,dfrnorm$P_T3)
And a simple plot could be like this,
plot(1:3,
c(CorrelationT1,CorrelationT2,CorrelationT3),
xlab="Time Points", ylab="Correlation")
I hope this will help you,
Cheers
CodePudding user response:
1) The question indicates you want the correlation between R and P for each sample so we are looking for 5 correlations corresponding to the 5 samples. Create two data frames each of which has one column per sample with rows corresponding to time. Then use mapply to get the correlations of the first column of R with the first column of P, the second column of R with the second column of P, etc.
isR <- startsWith(names(df), "R")
R <- as.data.frame(t(df[isR]))
P <- as.data.frame(t(df[!isR]))
mapply(cor, R, P)
## s1 s2 s3 s4 s5
## 1 1 1 1 1
2) It could also be written like this:
spl <- split(as.data.frame(t(df)), startsWith(names(df), "R"))
do.call("mapply", c(cor, unname(spl)))
## s1 s2 s3 s4 s5
## 1 1 1 1 1
3) or using pipes:
df |>
t() |>
as.data.frame() |>
list(. = _) |>
with(split(., startsWith(rownames(.), "R"))) |>
unname() |>
c(FUN = cor) |>
do.call(what = "mapply")