Home > Net >  collapse rows, keep all unique variable values in one column and all values in other column
collapse rows, keep all unique variable values in one column and all values in other column

Time:12-23

I need to take this data and group all matching values in column ESVId, while retaining each unique value in column match; and all values in column Form that were associated with each value in columnmatch(there may be duplicates!).

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000005", "ESV_000005", "ESV_000005", "ESV_000005", 
"ESV_000005", "ESV_000005", "ESV_000005", "ESV_000006", "ESV_000006", 
"ESV_000006", "ESV_000007"), MT_species = c(1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus", "Achillea millefolium", "Artemisia absinthium", 
"Artemisia ludoviciana", "Artemisia frigida", "Artemisia campestris", 
"Artemisia tridentata", "Artemisia tilesii", "Rubus arcticus", 
"Fragaria vesca", "Rosa acicularis", "Fragaria virginiana", "Rosa woodsii", 
"Rosa arkansana", "Rubus ursinus", "Poa pratensis", "Vahlodea atropurpurea", 
"Alopecurus magellanicus", "Prunus virginiana"), Form = c("Conifer", 
NA, "Forb", NA, "Forb", "Sub-Shrub", "Forb", "Shrub", NA, NA, 
"Forb", "Shrub", "Forb", "Shrub", NA, NA, "Graminoid", NA, NA, 
"Shrub")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

When I tried

MTTaxa_funct <- funct_esvs %>%
  group_by(ESVId) %>%
  summarise_all(funs(paste(unique(match, Form), collapse= " OR ")))%>%
  dplyr::select(ESVId, match, Form) %>% 
  ungroup()

it fills column Form identically to match, which is not at all what I want. I also need to keep any NA values in column Form. Ideally this would come out looking like this:

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000005", 
"ESV_000006", "ESV_000007"), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus OR Achillea millefolium OR Artemisia absinthium OR Artemisia ludoviciana OR Artemisia frigida OR Artemisia campestris OR Artemisia tridentata OR Artemisia tilesii", 
"Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fragaria virginiana OR Rosa woodsii OR Rosa arkansana OR Rubus ursinus", 
"Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magellanicus", 
"Prunus virginiana"
), Form = c("Conifer", "NA OR Forb OR NA OR Forb OR Sub-Shrub OR Forb OR Shrub OR NA", 
"NA OR Forb OR Shrub OR Forb OR Shrub OR NA OR NA", 
"Graminoid OR NA OR NA", 
"Shrub"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

CodePudding user response:

I'm not sure if this is what you need, since your expected output has elements not present in your source data, but perhaps this?

quux %>%
  group_by(ESVId) %>%
  summarize(
    match = paste(unique(match), collapse = " OR "), 
    Form = paste(Form, collapse = " OR ")
  )
# # A tibble: 5 × 3
#   ESVId      match                                                         Form 
#   <chr>      <chr>                                                         <chr>
# 1 ESV_000001 Pseudotsuga menziesii                                         Coni…
# 2 ESV_000004 Artemisia dracunculus OR Achillea millefolium OR Artemisia a… NA O…
# 3 ESV_000005 Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fraga… NA O…
# 4 ESV_000006 Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magella… Gram…
# 5 ESV_000007 Prunus virginiana                                             Shrub
  • Related