Home > Enterprise >  How to generate string counts in defferent samples by R
How to generate string counts in defferent samples by R

Time:11-24

Let's say I have a data table as follow:

ID1 ID2  ID3
  a   a    b
  a   b    b
  b   b    b
  c   c    c
  c   c    d
  c   d    d
  d        e
  d        e
  e

Then I want to convert it as like following structure:

Samples ID1 ID2 ID3
a        2  1   0
b        1  2   3
c        3  2   1
d        2  1   2
e        1  0   2

Would any of you please help me with R or bash code to achieve such transformation?

CodePudding user response:

Try the R code below

> table(stack(df))
      ind
values ID1 ID2 ID3
     a   2   1   0
     b   1   2   3
     c   3   2   1
     d   2   1   2
     e   1   0   2

data

> dput(df)
structure(list(ID1 = c("a", "a", "b", "c", "c", "c", "d", "d", 
"e"), ID2 = c("a", "b", "b", "c", "c", "d", NA, NA, NA), ID3 = c("b",
"b", "b", "c", "d", "d", "e", "e", NA)), class = "data.frame", row.names = c(NA,
-9L))

CodePudding user response:

An option with tidyverse - reshape to 'long' format with pivot_longer, get the count and reshape back to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
df %>%
   pivot_longer(everything(), values_drop_na = TRUE, values_to = 'Samples') %>%
   count(name, Samples) %>% 
   pivot_wider(names_from = name, values_from = n, values_fill = 0)

-output

# A tibble: 5 × 4
  Samples   ID1   ID2   ID3
  <chr>   <int> <int> <int>
1 a           2     1     0
2 b           1     2     3
3 c           3     2     1
4 d           2     1     2
5 e           1     0     2

data

df <- structure(list(ID1 = c("a", "a", "b", "c", "c", "c", "d", "d", 
"e"), ID2 = c("a", "b", "b", "c", "c", "d", NA, NA, NA), ID3 = c("b",
"b", "b", "c", "d", "d", "e", "e", NA)), class = "data.frame", 
row.names = c(NA,
-9L))
  • Related