Home > Blockchain >  Counting number of unique IDs per group at certain time points [duplicate]
Counting number of unique IDs per group at certain time points [duplicate]

Time:09-28

I'm trying to find the number of participants per gene at different time points. I'm attempting to do this with a nested for loop, however, I can't seem to figure it out. Here's something I've been trying:

IgH_CDR3_post_challenge_unique<- select(IgH_CDR3_post_challenge_unique, cdr3aa, gene, ID, Timepoint)
participant_list <- unique(IgH_CDR3_post_challenge_unique$gene)
time_list<- unique(IgH_CDR3_post_challenge_unique$Timepoint)
for (c in participant_list)
{
  for(i in time_list) 
  {
    IgH_CDR3_post_challenge_unique <- filter(IgH_CDR3_post_challenge_unique, Timepoint==time_list[i] )
  }
    IgH_CDR3_post_challenge_unique$participant_per_gene[IgH_CDR3_post_challenge_unique$gene == c] <- length(unique(IgH_CDR3_post_challenge_unique$ID[IgH_CDR3_post_challenge_unique$gene == c]))
  }

I would like the loops to end up calculating the number of participants per gene for each timepoint.

My data looks something like this:

gene Timepoint ID
1 C0 SP1
2 C1 SP2
1 C0 SP4
3 C0 SP2

CodePudding user response:

This could be achieved without the use of a loop using dplyr. Loops tend to get slow and cumbersome when your data becomes large.

First, use group_by to group the data by the relevant column and then count the number of unique IDs within each group.

library(dplyr)
> dat %>% group_by(Timepoint, gene) %>% summarise(n = length(unique(ID)))
# A tibble: 2 × 2
  Timepoint     n
  <chr>     <int>
1 C0            3
2 C1            1
  • Related