Can I use a dplyr pipe instead of lapplying over lists in R?-CodePudding

I'd like to know if I can use tidyverse for tasks I used lists in R so far. I got a matrix of species abundances per plot for which I want to calculate dissimilarity indices with vegdist from the vegan package. After that I'd like to put it in long format remove auto comparisons etc. It works like charm with dplyr for an easy example:

library(tidyverse)
library(vegan)
df <- data.frame(spec1=sample.int(50,10,replace=T),
             spec2=sample.int(75,10,replace=T),
             spec3=sample.int(10,10,replace=T),
             spec4=sample.int(40,10,replace=T),
             spec5=sample.int(50,10,replace=T),
             spec6=sample.int(5,10,replace=T))

 df%>%
  vegdist() %>%
  as.matrix() %>%
  as_tibble(rownames= "rownames") %>%
  pivot_longer(-rownames) %>%
  filter(rownames < name)

Now I want to do the same however the species belong to different categories and each category has to get its own distance matrix and only after it can be put back to a single long format data frame or tibble.

cat <- data.frame(spec=c("spec1","spec2","spec3","spec4","spec5","spec6"),
                  group=c("a","b","c","b","a","c"))
df%>%
  pivot_longer(cols = everything(),values_to="abundance",names_to="spec")%>%
  left_join(cat, by="spec")

The beginning is pretty straight forward but at the point where I am used to split the data to a list by the column group I am struggling to find a solution. I tried combinations of group_by pivot_wider vegdist or also group_split but unfortunately wasn't able to come up with a working solution. Does anybody have suggestions, or should I stick with lists for such cases?

CodePudding user response：

We may split the 'spec' by 'group' into a list, loop over the list and then select the columns based on the elements from the list and apply the vegdist

library(vegan)
library(purrr)
library(dplyr)
library(tidyr)
out <- map_dfr(split(cat$spec, cat$group), 
   ~  df %>%
         select(all_of(.x)) %>% 
         vegdist() %>% 
         as.matrix %>% 
         as_tibble(rownames = "rownames") %>% 
        pivot_longer(-rownames) %>% 
        filter(rownames < name), .id = 'group')

-output

> out
# A tibble: 135 × 4
   group rownames name   value
   <chr> <chr>    <chr>  <dbl>
 1 a     1        2     0.268 
 2 a     1        3     0.35  
 3 a     1        4     0.208 
 4 a     1        5     0.0682
 5 a     1        6     0.328 
 6 a     1        7     0.196 
 7 a     1        8     0.189 
 8 a     1        9     0.178 
 9 a     1        10    0.178 
10 a     2        3     0.0822
# … with 125 more rows

If we want a wide format,

cat %>% 
  pivot_wider(names_from = group, values_from = spec, values_fn = list) %>%
  summarise(across(everything(), ~
    df %>%
     select(all_of(unlist(.x))) %>% 
     vegdist() %>%
     as.matrix %>%
     as_tibble(rownames = "rownames") %>% 
     pivot_longer(-rownames) %>%
     filter(rownames < name))) %>% 
     unnest(where(is.list),  names_sep = "_")