for example there is table have 100 pairs of x and y value, how can I create matrix of joint distribution P[x,y] in r. I tried to solve it, but no idea how to start
CodePudding user response:
For example, lets derive x
and y
from categorical distributions like
library(extraDistr)
x <-rcat(100,rep(1,6)/6)
y <- rcat(100, rep(1,4)/4)
z <- cbind(x,y)
where z
is joint sample. Then,
library(dplyr)
z %>%
as.data.frame %>%
group_by(x,y) %>%
table()/dim(z)[1]
y
x 1 2 3 4
1 0.02 0.06 0.01 0.02
2 0.04 0.05 0.06 0.08
3 0.06 0.03 0.04 0.03
4 0.02 0.03 0.04 0.05
5 0.04 0.04 0.07 0.02
6 0.07 0.04 0.01 0.07
gives you an sample joint distributions for discrete variables.
With @Alex's data
dummy <- read.csv("x.csv")
head(dummy)
x y
1 3 3
2 1 3
3 1 3
4 2 2
5 1 4
6 3 4
Apply same code above.
dummy %>%
as.data.frame %>%
group_by(x,y) %>%
table()/dim(z)[1]
y
x 1 2 3 4
1 0.03 0.11 0.20 0.09
2 0.01 0.09 0.16 0.04
3 0.03 0.10 0.11 0.03
CodePudding user response:
Here is an approach using base R. First provide reproducible data using dput()
x <- c(3, 1, 1, 2, 1, 3, 2, 3, 2, 2, 2, 3, 3, 1, 1, 2, 1, 2, 1, 1,
2, 3, 1, 2, 3, 2, 2, 2, 3, 1, 2, 2, 2, 1, 1, 3, 3, 1, 2, 3, 1,
1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 3, 1, 1,
3, 1, 2, 2, 3, 3, 1, 1, 3, 3, 2, 3, 2, 1, 2, 3, 3, 2, 1, 3, 2,
1, 2, 1, 1, 2, 2, 3, 3, 3, 1, 2, 1, 1, 3, 1, 2, 2)
y <- c(3, 3, 3, 2, 4, 4, 2, 4, 3, 2, 2, 1, 2, 3, 3, 2, 3, 3, 2, 4,
3, 2, 3, 3, 3, 1, 3, 3, 1, 4, 2, 4, 2, 3, 3, 3, 3, 3, 4, 3, 4,
4, 3, 2, 2, 2, 3, 2, 4, 4, 4, 3, 2, 3, 2, 3, 3, 2, 3, 3, 4, 3,
3, 3, 3, 4, 2, 2, 3, 2, 3, 1, 3, 2, 3, 3, 3, 2, 3, 3, 2, 2, 4,
1, 3, 3, 2, 2, 3, 2, 4, 2, 2, 3, 1, 2, 3, 1, 3, 3)
Then tabulate:
(tbl <- table(x, y))
# y
# x 1 2 3 4
# 1 3 11 20 9
# 2 1 9 16 4
# 3 3 10 11 3
(tbl.prp <- prop.table(tbl))
# y
# x 1 2 3 4
# 1 0.03 0.11 0.20 0.09
# 2 0.01 0.09 0.16 0.04
# 3 0.03 0.10 0.11 0.03
sum(tbl)
# [1] 100