I am trying to do compute some correlations between the first 3 countries below (Germany, Italy, Japan) vs the 2nd 3 countries (US, Canada, UK).
So for example, it would be Germany vs US, Germany vs Canada, Germany vs UK, then Italy vs US, Italy, vs Canada, Italy vs UK and so on.
However, as my data do have some gaps in the first few rows (like in Germany & Japan), the usual corr function would not work. So in this case, the data should skip the first 2 rows of Germany & first 3 of Japan and start on row 3 & 4 for Germany & Japan, to compare with similar rows for US/Canada/UK, while having the correlation for full data for Italy.
So I was wondering how should I do this?
Thanks
df11 <-
tibble(
date = 2001:2010,
Germany = runif(10),
Italy = runif(10),
Japan = runif(10),
US = runif(10),
Canada = runif(10),
UK = runif(10)
)
df11$Germany[1:2] <- NA
df11$Japan[1:3] <- NA
CodePudding user response:
From mycor
function you can find in r-bloggers,
mycor<- function(x,...){
r<- apply(x, 2, function(j){
apply(x, 2, function(i){
as.numeric(cor.test(i,j)$estimate)
})
})
P<- apply(x, 2, function(j){
apply(x, 2, function(i){
as.numeric(cor.test(i,j)$p.value)
})
})
out<-c()
out$P<- P
out$r<- r
return(out)
}
mycor(df11)$r
date Germany Italy Japan US Canada UK
date 1.00000000 0.3829687 -0.09309048 -0.46562050 -0.44324591 0.41293491 0.7908250
Germany 0.38296868 1.0000000 0.32186956 -0.19135611 -0.49111087 -0.38151625 0.6377928
Italy -0.09309048 0.3218696 1.00000000 0.04341171 0.09589073 -0.32724552 0.2138135
Japan -0.46562050 -0.1913561 0.04341171 1.00000000 0.52800797 0.25226383 -0.4802936
US -0.44324591 -0.4911109 0.09589073 0.52800797 1.00000000 0.08124373 -0.4869257
Canada 0.41293491 -0.3815163 -0.32724552 0.25226383 0.08124373 1.00000000 0.1033160
UK 0.79082499 0.6377928 0.21381349 -0.48029364 -0.48692567 0.10331600 1.0000000
CodePudding user response:
Using cor
in combn
. Instead of df[-1]
, which just excludes the date column, you could explicitly define e.g. df11[c("Germany", "Italy", "Japan")]
to make a specific selection of countries. use="complete.obs"
includes just complete observations, see documentation ?cor
for other options.
res <- combn(df11[-1], 2, cor, use="complete.obs", simplify=FALSE)
Result
res
# [[1]]
# Germany Italy
# Germany 1.00000000 -0.08586634
# Italy -0.08586634 1.00000000
#
# [[2]]
# Germany Japan
# Germany 1.0000000 0.3699611
# Japan 0.3699611 1.0000000
#
# [[3]]
# Germany US
# Germany 1.00000000 0.07002937
# US 0.07002937 1.00000000
#
# [[4]]
# Germany Canada
# Germany 1.0000000 0.3949677
# Canada 0.3949677 1.0000000
#
# [[5]]
# Germany UK
# Germany 1.0000000 0.5248062
# UK 0.5248062 1.0000000
#
# [[6]]
# Italy Japan
# Italy 1.00000000 0.09700777
# Japan 0.09700777 1.00000000
#
# [[7]]
# Italy US
# Italy 1.0000000 -0.1351394
# US -0.1351394 1.0000000
#
# [[8]]
# Italy Canada
# Italy 1.0000000 -0.1587657
# Canada -0.1587657 1.0000000
#
# [[9]]
# Italy UK
# Italy 1.0000000 -0.5379418
# UK -0.5379418 1.0000000
#
# [[10]]
# Japan US
# Japan 1.000000 -0.744641
# US -0.744641 1.000000
#
# [[11]]
# Japan Canada
# Japan 1.0000000 0.5813378
# Canada 0.5813378 1.0000000
#
# [[12]]
# Japan UK
# Japan 1.0000000 -0.1877573
# UK -0.1877573 1.0000000
#
# [[13]]
# US Canada
# US 1.0000000 -0.5947739
# Canada -0.5947739 1.0000000
#
# [[14]]
# US UK
# US 1.00000000 0.01007044
# UK 0.01007044 1.00000000
#
# [[15]]
# Canada UK
# Canada 1.0000000 0.3784406
# UK 0.3784406 1.0000000
If you are interested in the coefficients rather than the matrices, just do.
sapply(res, `[`, 2)
# [1] -0.08586634 0.36996112 0.07002937 0.39496772 0.52480618 0.09700777
# [7] -0.13513936 -0.15876568 -0.53794180 -0.74464099 0.58133779 -0.18775725
# [13] -0.59477393 0.01007044 0.37844058
Or if you want the matrix,
cor(df11[-1], use="complete.obs")
# Germany Italy Japan US Canada UK
# Germany 1.00000000 0.05086447 0.36996112 -0.02672511 0.3531181 0.4261229
# Italy 0.05086447 1.00000000 0.09700777 -0.03657892 -0.1514971 -0.6038602
# Japan 0.36996112 0.09700777 1.00000000 -0.74464099 0.5813378 -0.1877573
# US -0.02672511 -0.03657892 -0.74464099 1.00000000 -0.6574211 0.2106356
# Canada 0.35311813 -0.15149706 0.58133779 -0.65742113 1.0000000 0.2452463
# UK 0.42612287 -0.60386019 -0.18775725 0.21063562 0.2452463 1.0000000
If you aim to use cor.test
you can use the names
in combn
and paste them as.formula
using sprintf
. See above how to get just a subset.
res2 <- combn(names(df11[-1]), 2, \(x)
cor.test(as.formula(sprintf('~ %s %s', x[1], x[2])), data=df11),
simplify=F)
Result 2
res2[[1]] ## first result
# Pearson's product-moment correlation
#
# data: Germany and Italy
# t = -0.21111, df = 6, p-value = 0.8398
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# -0.7454347 0.6586606
# sample estimates:
# cor
# -0.08586634