Home > Net >  How to add a new column to a dataframe using values from the rows?
How to add a new column to a dataframe using values from the rows?

Time:01-31

I'm trying to calculate the sum multinomial distributions. Think of this as elections: Suppose there are 11 voters and three candidates. Candidate A has a 0.5 probability of being chosen, B has 0.3 and C has 0.2. I am interested in calculating the probability that A wins.

So I am taking all the possible scenarios in which A wins under plurality vote (the option which has the most votes wins) and calculating the probability of them happening and then summing all these values.

My problem comes when I try to calculate the multinomial distribution of each individual scenario in which A wins.

I have a dataframe with all the possible outcomes in which A wins and ideally I want a new column in which the probability of each happening is shown. Sort of like this:

   V1 V2 V3 dmultinom(x = c(5, 3, 3), prob = options)
1   5  2  4                                   0.06237
2   5  3  3                                   0.06237
3   5  4  2                                   0.06237
4   6  0  5                                   0.06237
5   6  1  4                                   0.06237
6   6  2  3                                   0.06237
7   6  3  2                                   0.06237
8   6  4  1                                   0.06237
9   6  5  0                                   0.06237
10  7  0  4                                   0.06237
11  7  1  3                                   0.06237
12  7  2  2                                   0.06237
13  7  3  1                                   0.06237
14  7  4  0                                   0.06237
15  8  0  3                                   0.06237
16  8  1  2                                   0.06237
17  8  2  1                                   0.06237
18  8  3  0                                   0.06237
19  9  0  2                                   0.06237
20  9  1  1                                   0.06237
21  9  2  0                                   0.06237
22 10  0  1                                   0.06237
23 10  1  0                                   0.06237
24 11  0  0                                   0.06237

But with the right values, of course.

I tried to access the values of the rows using $ but with no success. I also tried to create a new column with the values of the rows as vectors using dsplyr but couldn't do it either.

CodePudding user response:

Here's an option with data.table

library(data.table)

#create data.frame
xx <- data.frame(V1 = c(5, 5, 5, 6),
                 V2 = c(2, 3, 4, 0),
                 V3 = c(4, 3, 2, 5))

#convert the data.frame to a data.table
setDT(xx)

#put the data in long format
xx <- data.table::melt(xx,
                       measure.vars = names(xx))

#make a grouping variable
xx[, group := rep(1:4, 3)]

#apply function to each group
xx[, probability := dmultinom(value, prob = c(0.5, 0.3, 0.2)), by = "group"]

#pivot data back to wider format
yy <- data.table::dcast(xx[, !c("group")],
                        probability ~ variable,
                        value.var = "value")
> yy

   probability V1 V2 V3
1:  0.00231000  6  0  5
2:  0.03118500  5  2  4
3:  0.06237000  5  3  3
4:  0.07016625  5  4  2

CodePudding user response:

Not the prettiest solutions, but it can be accomplished with a for-loop.

First, creating the empty column:

dat$multinom <- 0

Next, iterate through the dataframe adding the multinomial with the inputs from V1, V2 and V3

for (i in 1:nrow(dat)) {
  dat$multinom[i] <- dmultinom(x = c(dat$V1[i], dat$V2[i], dat$V3[i]), prob = options)
}
  •  Tags:  
  • r
  • Related