Home > front end >  how to create matrix of joint distribution P[x,y] from a table of dataset
how to create matrix of joint distribution P[x,y] from a table of dataset

Time:09-28

for example there is table have 100 pairs of x and y value, how can I create matrix of joint distribution P[x,y] in r. I tried to solve it, but no idea how to start

CodePudding user response:

For example, lets derive x and y from categorical distributions like

library(extraDistr)

x <-rcat(100,rep(1,6)/6)
y <- rcat(100, rep(1,4)/4)
z <- cbind(x,y)

where z is joint sample. Then,

library(dplyr)

z %>%
  as.data.frame %>%
  group_by(x,y) %>%
  table()/dim(z)[1]

   y
x      1    2    3    4
  1 0.02 0.06 0.01 0.02
  2 0.04 0.05 0.06 0.08
  3 0.06 0.03 0.04 0.03
  4 0.02 0.03 0.04 0.05
  5 0.04 0.04 0.07 0.02
  6 0.07 0.04 0.01 0.07

gives you an sample joint distributions for discrete variables.

With @Alex's data

dummy <- read.csv("x.csv")
head(dummy)
  x y
1 3 3
2 1 3
3 1 3
4 2 2
5 1 4
6 3 4

Apply same code above.

dummy %>%
  as.data.frame %>%
  group_by(x,y) %>%
  table()/dim(z)[1]

   y
x      1    2    3    4
  1 0.03 0.11 0.20 0.09
  2 0.01 0.09 0.16 0.04
  3 0.03 0.10 0.11 0.03

CodePudding user response:

Here is an approach using base R. First provide reproducible data using dput()

x <- c(3, 1, 1, 2, 1, 3, 2, 3, 2, 2, 2, 3, 3, 1, 1, 2, 1, 2, 1, 1, 
     2, 3, 1, 2, 3, 2, 2, 2, 3, 1, 2, 2, 2, 1, 1, 3, 3, 1, 2, 3, 1, 
     1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 3, 1, 1, 
     3, 1, 2, 2, 3, 3, 1, 1, 3, 3, 2, 3, 2, 1, 2, 3, 3, 2, 1, 3, 2, 
     1, 2, 1, 1, 2, 2, 3, 3, 3, 1, 2, 1, 1, 3, 1, 2, 2)

y <- c(3, 3, 3, 2, 4, 4, 2, 4, 3, 2, 2, 1, 2, 3, 3, 2, 3, 3, 2, 4, 
     3, 2, 3, 3, 3, 1, 3, 3, 1, 4, 2, 4, 2, 3, 3, 3, 3, 3, 4, 3, 4, 
     4, 3, 2, 2, 2, 3, 2, 4, 4, 4, 3, 2, 3, 2, 3, 3, 2, 3, 3, 4, 3, 
     3, 3, 3, 4, 2, 2, 3, 2, 3, 1, 3, 2, 3, 3, 3, 2, 3, 3, 2, 2, 4, 
     1, 3, 3, 2, 2, 3, 2, 4, 2, 2, 3, 1, 2, 3, 1, 3, 3)

Then tabulate:

(tbl <- table(x, y))
#    y
# x    1  2  3  4
#   1  3 11 20  9
#   2  1  9 16  4
#   3  3 10 11  3
(tbl.prp <- prop.table(tbl))
#    y
# x      1    2    3    4
#   1 0.03 0.11 0.20 0.09
#   2 0.01 0.09 0.16 0.04
#   3 0.03 0.10 0.11 0.03

sum(tbl)
# [1] 100
  •  Tags:  
  • r
  • Related