Counting number of unique IDs per group at certain time points [duplicate]-CodePudding

I'm trying to find the number of participants per gene at different time points. I'm attempting to do this with a nested for loop, however, I can't seem to figure it out. Here's something I've been trying:

IgH_CDR3_post_challenge_unique<- select(IgH_CDR3_post_challenge_unique, cdr3aa, gene, ID, Timepoint)
participant_list <- unique(IgH_CDR3_post_challenge_unique$gene)
time_list<- unique(IgH_CDR3_post_challenge_unique$Timepoint)
for (c in participant_list)
{
  for(i in time_list) 
  {
    IgH_CDR3_post_challenge_unique <- filter(IgH_CDR3_post_challenge_unique, Timepoint==time_list[i] )
  }
    IgH_CDR3_post_challenge_unique$participant_per_gene[IgH_CDR3_post_challenge_unique$gene == c] <- length(unique(IgH_CDR3_post_challenge_unique$ID[IgH_CDR3_post_challenge_unique$gene == c]))
  }

I would like the loops to end up calculating the number of participants per gene for each timepoint.

My data looks something like this:

gene	Timepoint	ID
1	C0	SP1
2	C1	SP2
1	C0	SP4
3	C0	SP2

CodePudding user response：

This could be achieved without the use of a loop using dplyr. Loops tend to get slow and cumbersome when your data becomes large.

First, use group_by to group the data by the relevant column and then count the number of unique IDs within each group.

library(dplyr)
> dat %>% group_by(Timepoint, gene) %>% summarise(n = length(unique(ID)))
# A tibble: 2 × 2
  Timepoint     n
  <chr>     <int>
1 C0            3
2 C1            1