Home > Software engineering >  Bray curtis dissimilarity
Bray curtis dissimilarity

Time:12-16

I am trying to get the Bray Curtis dissimilarity for 2 communities, but the data frame for the communities have not the same dimension, These data are split into name of species and abundances

Community 1:

structure(list(taxon = c("Acroloxus lacustris", "Ancylus fluviatilis", 
"Asellus aquaticus", "Bithynia tentaculata", "Bryozoa Gen. sp.", 
"Chironomidae Gen. sp.", "Ephydatia fluviatilis", "Erpobdella octoculata", 
"Glossiphonia complanata", "Physella acuta", "Plumatella fungosa", 
"Plumatella repens", "Radix balthica/labiata", "Sphaerium corneum", 
"Spongilla lacustris", "Spongillidae Gen. sp.", "Tubificidae Gen. sp."
), abundance = c(88, 192, 930, 19, 52, 6, 28, 471, 1, 5, 27, 
8, 439, 65, 6, 85, 1)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -17L))

Community 2:

structure(list(taxon = c("Acroloxus lacustris", "Agraylea sp.", 
"Alboglossiphonia heteroclita", "Ancylus fluviatilis", "Asellus aquaticus", 
"Bithynia tentaculata", "Bryozoa Gen. sp.", "Chironomidae Gen. sp.", 
"Coenagrionidae Gen. sp.", "Dendrocoelum lacteum", "Dreissena polymorpha", 
"Dugesia lugubris", "Dugesia lugubris/polychroa", "Dugesia tigrina", 
"Ephydatia fluviatilis", "Erpobdella octoculata", "Fredericella sultana", 
"Gammarus pulex", "Gammarus sp.", "Glossiphonia complanata", 
"Hydropsyche contubernalis ssp.", "Hydropsyche sp.", "Hydroptila sp.", 
"Physa fontinalis", "Physella acuta", "Planaria torva", "Platycnemis pennipes", 
"Plumatella fungosa", "Polycelis nigra", "Polycelis nigra/tenuis", 
"Radix balthica", "Radix balthica/labiata", "Sphaerium corneum", 
"Spongilla lacustris", "Spongillidae Gen. sp.", "Stylaria lacustris", 
"Tinodes waeneri ssp.", "Valvata piscinalis piscinalis", "Viviparus viviparus"
), abundance = c(109, 100, 1, 368, 1599, 402, 610, 92, 1, 16, 
3182, 50, 1, 241, 6, 1400, 20, 59, 6, 46, 20, 670, 65, 83, 24, 
1, 1, 6, 7, 7, 84, 991, 206, 7, 708, 1, 20, 1, 6)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -39L)) 

How can I calculate the dissimilarity between these communities?. All help it appreciate

library(vegan)
vegdist(data, method ="bray")

CodePudding user response:

Sure, just replace these values with 0 if there were not observed in one sample. Join the data tables together to get the dissimilarity matrix in one vegan call:

data1 <- structure(list(taxon = c(
  "Acroloxus lacustris", "Ancylus fluviatilis",
  "Asellus aquaticus", "Bithynia tentaculata", "Bryozoa Gen. sp.",
  "Chironomidae Gen. sp.", "Ephydatia fluviatilis", "Erpobdella octoculata",
  "Glossiphonia complanata", "Physella acuta", "Plumatella fungosa",
  "Plumatella repens", "Radix balthica/labiata", "Sphaerium corneum",
  "Spongilla lacustris", "Spongillidae Gen. sp.", "Tubificidae Gen. sp."
), abundance = c(
  88, 192, 930, 19, 52, 6, 28, 471, 1, 5, 27,
  8, 439, 65, 6, 85, 1
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -17L))

data2 <- structure(list(taxon = c(
  "Acroloxus lacustris", "Agraylea sp.",
  "Alboglossiphonia heteroclita", "Ancylus fluviatilis", "Asellus aquaticus",
  "Bithynia tentaculata", "Bryozoa Gen. sp.", "Chironomidae Gen. sp.",
  "Coenagrionidae Gen. sp.", "Dendrocoelum lacteum", "Dreissena polymorpha",
  "Dugesia lugubris", "Dugesia lugubris/polychroa", "Dugesia tigrina",
  "Ephydatia fluviatilis", "Erpobdella octoculata", "Fredericella sultana",
  "Gammarus pulex", "Gammarus sp.", "Glossiphonia complanata",
  "Hydropsyche contubernalis ssp.", "Hydropsyche sp.", "Hydroptila sp.",
  "Physa fontinalis", "Physella acuta", "Planaria torva", "Platycnemis pennipes",
  "Plumatella fungosa", "Polycelis nigra", "Polycelis nigra/tenuis",
  "Radix balthica", "Radix balthica/labiata", "Sphaerium corneum",
  "Spongilla lacustris", "Spongillidae Gen. sp.", "Stylaria lacustris",
  "Tinodes waeneri ssp.", "Valvata piscinalis piscinalis", "Viviparus viviparus"
), abundance = c(
  109, 100, 1, 368, 1599, 402, 610, 92, 1, 16,
  3182, 50, 1, 241, 6, 1400, 20, 59, 6, 46, 20, 670, 65, 83, 24,
  1, 1, 6, 7, 7, 84, 991, 206, 7, 708, 1, 20, 1, 6
)), class = c(
  "tbl_df",
  "tbl", "data.frame"
), row.names = c(NA, -39L))

library(vegan)
library(tidyverse)

data <-
  list(
    data1 %>% rename(sample_1 = abundance),
    data2 %>% rename(sample_2 = abundance)
  ) %>%
  reduce(full_join) %>%
  # fill missing values with 0
  mutate(across(matches("^sample_"), ~ replace_na(.x, 0)))
data

# dissimilarities between taxa
data %>%
  column_to_rownames("taxon") %>%
  vegdist(method = "bray")


# dissimilarities between samples
data %>%
  column_to_rownames("taxon") %>%
  t() %>%
  vegdist(method = "bray")

Please note that you might want to normalize the counts e.g. to percentages to control for differences in library size (Total sum of all taxa) between the samples. This applies to relative count data e.g. obtained from DNA barcode sequencing data. Since you are counting big(ger) animals, you have absolute counts here thus no normalization needed.

  •  Tags:  
  • r
  • Related