Co-occurrence calculations from a presence-absence matrix-CodePudding

I have this great presence-absence dataset in which I need to calculate a C score (CS) for each species pair (BABO, BW, RC, SKS, MANG) to measure species co-occurrences.

ki and kj denote the numbers of occurrences of species i and j and K is the number of co-occurrences of both species.

I have looked at articles like Find the pair of most correlated variables and Turning co-occurrence counts into co-occurrence probabilities with cascalog but was unable to determine the most efficient way to go about it in R. I have tried creating a function but was unsuccessful.

My data:

data <- structure(list(group_id = c("2008-2-11.C3_900", "2008-2-11.C3_960", 
"2008-2-11.C3_1200", "2008-2-11.C3_1230", "2008-2-11.C3_1460", 
"2008-2-11.C3_1490", "2008-2-22.Mwani_0", "2008-2-22.Mwani_110", 
"2008-2-22.Mwani_600", "2008-2-22.Mwani_1650", "2008-2-20.Sanje_150", 
"2008-2-20.Sanje_410", "2008-2-20.Sanje_3000", "2008-5-9.C3_900", 
"2008-5-13.Mwani_750", "2008-5-13.Mwani_800", "2008-5-13.Mwani_900", 
"2008-5-13.Mwani_1080", "2008-5-13.Mwani_1800", "2008-5-13.Mwani_2200", 
"2008-5-13.Mwani_2900"), BABO = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), BW = c(1, 1, 1, 1, 0, 0, 
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), RC = c(0, 0, 1, 
1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0), SKS = c(0, 
1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0), 
    MANG = c(0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 1)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"))

group_id               BABO BW RC SKS MANG
<chr>                  <dbl><dbl><dbl><dbl>
2008-2-11.C3_900        0   1   0   0   0
2008-2-11.C3_960        0   1   0   1   0
2008-2-11.C3_1200       0   1   1   1   0
2008-2-11.C3_1230       0   1   1   0   0
2008-2-11.C3_1460       0   0   1   0   0
2008-2-11.C3_1490       0   0   0   1   0
2008-2-22.Mwani_0       0   0   0   1   1
2008-2-22.Mwani_110     0   0   1   0   1
2008-2-22.Mwani_600     0   1   0   0   0
2008-2-22.Mwani_1650    0   0   0   1   0
2008-2-20.Sanje_150     0   1   0   1   0
2008-2-20.Sanje_410     0   0   1   0   0
2008-2-20.Sanje_3000    0   0   1   0   0
2008-5-9.C3_900         0   0   0   1   0
2008-5-13.Mwani_750     0   0   0   1   0
2008-5-13.Mwani_800     1   0   1   0   0
2008-5-13.Mwani_900     0   0   1   0   0
2008-5-13.Mwani_1080    0   1   1   0   0
2008-5-13.Mwani_1800    0   1   0   0   0
2008-5-13.Mwani_2200    0   1   0   0   0
2008-5-13.Mwani_2900    0   1   0   0   1

CodePudding user response：

First define a function to compute CS for a given pair of species; then use combn() to generate all possible pairs; then pass each pair to the CS() function using purrr::map2_dbl().

library(dplyr)
library(purrr)

CS <- function(i, j, data) {
  K <- sum(data[[i]] == 1 & data[[j]] == 1)
  ki <- sum(data[[i]])
  kj <- sum(data[[j]])
  ((ki - K) * (kj - K)) / (ki * kj)
}

names(sp_data)[-1] %>%
  combn(2) %>%
  t() %>%
  as_tibble() %>%
  set_names(c("i", "j")) %>%
  mutate(CS = map2_dbl(i, j, CS, data = sp_data))

# A tibble: 10 × 3
   i     j        CS
   <chr> <chr> <dbl>
 1 BABO  BW    1    
 2 BABO  RC    0    
 3 BABO  SKS   1    
 4 BABO  MANG  1    
 5 BW    RC    0.467
 6 BW    SKS   0.438
 7 BW    MANG  0.6  
 8 RC    SKS   0.778
 9 RC    MANG  0.593
10 SKS   MANG  0.583

(Side note: using the equation you provided, species with little co-occurrence end up with high CS scores, and vice versa. If this isn’t what you expected, perhaps your equation should be subtracted from 1?)

CodePudding user response：

In base R, we may use crossprod

t1 <- crossprod(as.matrix(data[-1]))
v1 <- diag(t1)
c1 <- v1[col(t1)]
r1 <- v1[row(t1)]
((c1 - t1) * (r1 - t1))/(c1 * r1)

-output

     BABO        BW        RC       SKS      MANG
BABO    0 1.0000000 0.0000000 1.0000000 1.0000000
BW      1 0.0000000 0.4666667 0.4375000 0.6000000
RC      0 0.4666667 0.0000000 0.7777778 0.5925926
SKS     1 0.4375000 0.7777778 0.0000000 0.5833333
MANG    1 0.6000000 0.5925926 0.5833333 0.0000000