Home > front end >  Find the counts of all elements in column
Find the counts of all elements in column

Time:12-14

Suppose I have a dataframe

related <- structure(list(IID1 = c("035-0028-0151526", "036-1002-0282844", 
"950-0001-0081329", "951-1000-0084436", "954-0009-0337439", "954-0011-0379455", 
"955-0017-0354237", "6011039", "6011039", "6011039", "6011039", 
"6205526", "A1858", "A2159"), IID2 = c("035-0028-0157134", "036-1002-0305628", 
"950-0001-0100509", "951-1000-0084537", "954-0009-0337641", "954-0011-0379657", 
"955-0017-0363137", "6205526", "6215777", "A1864", "A2267", "A1864", 
"A1864", "A2267")), row.names = c(NA, -14L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x38b6d80>)

I would like to find the counts for all elements in IID1 and IID2.

I tried this code below, but it only generates counts for unique IIDs and not all values (including duplicates).

   as.vector(aggregate(n ~ IID1, data=transform(related,n=1), length)[,2])
   as.vector(aggregate(n ~ IID2, data=transform(related,n=1), length)[,2])

I want to be able to do something like this below. What should be my approach?

related$nIID1 <- as.vector(aggregate(n ~ IID1, data=transform(related,n=1), length)[,2])
related$nIID2 <- as.vector(aggregate(n ~ IID2, data=transform(related,n=1), length)[,2])

CodePudding user response:

If we need to create a column, use ave instead of aggregate as aggregate returns the summarised output

related$nIID1 <- with(transform(related, n = 1),  ave(n, IID1, FUN = length))

CodePudding user response:

And another attempt with data.table only:

df_count = rbind(related[, IID:=IID1], related[, IID:=IID2])[, "IID"]

df_count[, .N, by=IID]

Output:

                 IID N
 1: 035-0028-0157134 2
 2: 036-1002-0305628 2
 3: 950-0001-0100509 2
 4: 951-1000-0084537 2
 5: 954-0009-0337641 2
 6: 954-0011-0379657 2
 7: 955-0017-0363137 2
 8:          6205526 2
 9:          6215777 2
10:            A1864 6
11:            A2267 4

Or if you want to count the combinations between IID1 and IID2:

related[, .N, by=.(IID1, IID2)]

                IID1             IID2 N
 1: 035-0028-0151526 035-0028-0157134 1
 2: 036-1002-0282844 036-1002-0305628 1
 3: 950-0001-0081329 950-0001-0100509 1
 4: 951-1000-0084436 951-1000-0084537 1
 5: 954-0009-0337439 954-0009-0337641 1
 6: 954-0011-0379455 954-0011-0379657 1
 7: 955-0017-0354237 955-0017-0363137 1
 8:          6011039          6205526 1
 9:          6011039          6215777 1
10:          6011039            A1864 1
11:          6011039            A2267 1
12:          6205526            A1864 1
13:            A1858            A1864 1
14:            A2159            A2267 1
  •  Tags:  
  • r
  • Related