Home > Software engineering >  Count occurrences of value in multiple columns with duplicates
Count occurrences of value in multiple columns with duplicates

Time:03-26

My problem is very similar to: R: Count occurrences of value in multiple columns

However, the solution proposed there doesn't work for me because in the same row the value may appear twice but I want to count only the rows where this appears. I have worked out a solution but it seems too long:

> toy_data = data.table(from=c("A","A","A","C","E","E"), to=c("B","C","A","D","F","E"))
> toy_data
   from to
1:    A  B
2:    A  C
3:    A  A
4:    C  D
5:    E  F
6:    E  E
> #get a table with intra-link count
> A = data.table(table(unlist(toy_data[from==to,from ])))
> A
   V1 N
1:  A 1
2:  E 1
A #get a table with total count
> B = data.table(table(unlist(toy_data[,c(from,to)])))
> B
   V1 N
1:  A 4
2:  B 1
3:  C 2
4:  D 1
5:  E 3
6:  F 1
> 
> # concatenate changing sign
> table = rbind(B,A[,.(V1,-N)],use.names=FALSE)
> # groupby and subtract
> table[,sum(N),by=V1]
   V1 V1
1:  A  3
2:  B  1
3:  C  2
4:  D  1
5:  E  2
6:  F  1

Is there some function that would do the job in less lines? I thought in python I'd concatenate from and to then match(), cannot find the right sintax though

EDIT: I know this would work A=length(toy_data[from=="A"|to=="A",from]) but I would like avoiding loops among the various "A","B"... (and I don't know how to format output in this way)

CodePudding user response:

You can try the code below

> toy_data[, to := replace(to, from == to, NA)][, data.frame(table(unlist(.SD)))]
  Var1 Freq
1    A    3
2    B    1
3    C    2
4    D    1
5    E    2
6    F    1

or

toy_data %>%
    mutate(to = replace(to, from == to, NA)) %>%
    unlist() %>%
    table() %>%
    as.data.frame()

which gives

  . Freq
1 A    3
2 B    1
3 C    2
4 D    1
5 E    2
6 F    1

CodePudding user response:

Using data.table

library(data.table)
toy_data[from == to, to := NA][, .(to = na.omit(c(from, to)))][, .N, to]

CodePudding user response:

Using to:=NA as suggested by akrun, one can wrap the result in table(unlist()) and convert to data.table

data.table(table(unlist(toy_data[from==to, to:=NA, from])))

CodePudding user response:

You could just subset the to vector:

data.table(table(unlist(toy_data[,c(from,to[to!=from])])))

   V1 N
1:  A 3
2:  B 1
3:  C 2
4:  D 1
5:  E 2
6:  F 1
  • Related