Home > Enterprise >  Get the frequency of different observations from every two columns
Get the frequency of different observations from every two columns

Time:10-29

I have an R data frame as follow:

position    10164.0 10164.1 10192.0 10192.1 10316.0 10316.1 10349.0 10349.1 10418.0 10418.1
4414    1   1   1   1   1   1   1   1   1   1
5295    1   1   1   1   1   1   1   1   1   1
5738    1   1   1   1   1   1   1   1   1   1
5785    1   1   1   1   1   1   1   1   1   1
6392    1   1   1   1   1   1   1   1   1   1
7727    1   1   1   2   1   1   1   1   1   1
8876    1   1   1   2   1   1   1   1   1   1
9018    1   1   1   2   1   0   1   1   1   1
9208    0   1   1   2   1   0   1   2   1   1
9627    0   1   1   2   1   0   1   2   0   1

As you can see from the 2nd column onwards the column names have a suffix (either .0 or .1) I would like to count the number of observations in the column 1 (positions), and multiply by 2 (in this case it would be 20), and then for every column that has the same name before the suffix .0 and .1, count the number of different observations and divide by the number of observations in positions*2..

This is what I got so far with the commando in R below:

df = read.table(file = 'df.tsv', sep = '\t', header = TRUE)
df2 = sapply(df, function(x) table(factor(x, levels = c("0", "1", "2"))))
write.table(df2, file='df2.tsv', quote=FALSE, sep='\t')

This gives me:

observations position   X10164.0    X10164.1    X10192.0    X10192.1    X10316.0    X10316.1    X10349.0    X10349.1    X10418.0    X10418.1
0   0   2   0   0   0   0   3   0   0   1   0
1   0   8   10  10  5   10  7   10  8   9   10
2   0   0   0   0   5   0   0   0   2   0   0

The desired output would be the transposed data frame as follow:

observations    0   1   2
X10164  0.1000  0.9000  0.0000
X10192  0.0000  0.7500  0.2500
X10316  0.1500  0.8500  0.0000
X10349  0.0000  0.9000  0.1000
X10418  0.0500  0.9500  0.0000

CodePudding user response:

in base R you could do:

a <- reshape(df,-1, dir='long', idvar = 'position')[-(1:2)]
t(prop.table(table(stack(a)),2))

       values
ind         0    1    2
  X10164 0.10 0.90 0.00
  X10192 0.00 0.75 0.25
  X10316 0.15 0.85 0.00
  X10349 0.00 0.90 0.10
  X10418 0.05 0.95 0.00
  • Related