Am trying to compute the correlations of the below countries, with USA. I have a relatively big dataset with 80 variables & 3000 observations in my first df as below, so am trying to use R to automate this instead of using excel.
I am trying to compute correlations for the countries in the first df (i.e. Germany, Italy, Japan and more) with USA in the 2nd df. So it should go Germany - USA, Italy - USA, Japan - USA and so on.
Not too sure how should I begin - should I loop every column in the first table to correlate with USA in the 2nd? Help is much appreciated.
Thanks!
df1
Date | Germany | Italy | Japan | More countries... |
---|---|---|---|---|
01-01-2020 | 1000 | 200 | 2304 | More numbers... |
01-02-2020 | 2000 | 389 | 2098 | More numbers... |
and on and on
df2
Date | USA |
---|---|
01-01-2020 | 500 |
01-02-2020 | 600 |
and on and on
CodePudding user response:
Something like that should do the trick:
df1 <-
tibble(
date = 2001:2010,
Germany = runif(10),
Italy = runif(10),
Japan = runif(10)
)
df2 <-
tibble(
date = 2001:2010,
USA = runif(10)
)
df.cor <-
df1 %>%
summarise(across(-one_of('date'), ~ cor(.x, df2$USA)))
df.cor
Note: You have to be sure that dates are consistent between df1
and df2
. You can use join function (e.g. left_join
) to ensure this
CodePudding user response:
You could use this approach:
countries = c("Germany", "Italy", "Japan")
left_join(df1, df2) %>% summarise(across(countries, ~cor(., USA)))
left_join
mergesdf1
anddf2
together so that the dates always match up with one anothersummarise
allows you to perform column-wise operationsacross
tells you which columns you want to make a correlation with USA~cor(., USA)
says take each country and perform the correlation with USA
Germany Italy Japan
<dbl> <dbl> <dbl>
1 -0.393 -0.147 -0.214
Thank you Damien Georges for the data.