How to paste the identity of counted objects into a new column in R?-CodePudding

I have the following data set with biotyped ensembl genes and some code to count the number of genes for each biotype.

genes <- c("ENSG01","ENSG02","ENSG03","ENSG04","ENSG05")
biotype <- c("protein_coding","protein_coding","protein_coding","lncRNA","lncRNA")
data <- data.frame(genes, biotype)
data
   genes        biotype
1 ENSG01 protein_coding
2 ENSG02 protein_coding
3 ENSG03 protein_coding
4 ENSG04         lncRNA
5 ENSG05         lncRNA

data_cts <- data %>%
    group_by(biotype) %>%
    dplyr::count()
data_cts
# A tibble: 2 × 2
# Groups:   biotype [2]
  biotype            n
  <chr>          <int>
1 lncRNA             2
2 protein_coding     3

How can I retain the gene ensembl id's of those counted genes in a new column like shown below?

ENSEMBL <- c("ENSG04/ENSG05","ENSG01/ENSG02/ENSG03")
data_genes <- data.frame(data_cts, ENSEMBL)
data_genes
         biotype n              ENSEMBL
1         lncRNA 2        ENSG04/ENSG05
2 protein_coding 3 ENSG01/ENSG02/ENSG03

Thanks in advance

CodePudding user response：

Update: many thanks to @langtang: We could shorten the code:

library(dplyr)

data %>%  
  group_by(biotype) %>% 
  summarize(n = n(), ENSEMBL = paste0(genes,collapse="/"))

We could do it this way by grouping and counting and finally summarising:

library(dplyr)

data %>% 
  group_by(biotype) %>% 
  add_count() %>%
  group_by(biotype, n) %>% 
  summarise(ENSEMBL = paste(genes, collapse = "/")) %>% 
  ungroup()

  biotype            n ENSEMBL             
  <chr>          <int> <chr>               
1 lncRNA             2 ENSG04/ENSG05       
2 protein_coding     3 ENSG01/ENSG02/ENSG03