Home > Software engineering >  convert dataframe to binary features in r
convert dataframe to binary features in r

Time:11-10

I have a dataframe:

participants <- c(A, A, A, B, C, C)

answers <- c(alpha, beta, gamma, beta, beta, gamma)

participants answers
A            alpha
A            beta
A            gamma
B            beta
C            beta
C            gamma

The 'answers' column contains many more than just this little set.

how do I make it into binary features like the following:

participant answers value
A           alpha   1
A           beta    1
A           gamma   1
B           alpha   0
B           beta    1
B           gamma   0
C           alpha   0
C           beta    1
C           gamma   1

My guess is that I have to get the levels of the 'answers' and the 'participants' too?

But I'm not sure how to do it next. Thanks!

CodePudding user response:

In base R you could do:

data.frame(table(df1))
  participants answers Freq
1            A   alpha    1
2            B   alpha    0
3            C   alpha    0
4            A    beta    1
5            B    beta    1
6            C    beta    1
7            A   gamma    1
8            B   gamma    0
9            C   gamma    1

The above is not ordered the same way as your table. To do that, you could do:

with(a<-data.frame(table(df1)), a[order(participants),])
  participants answers Freq
1            A   alpha    1
4            A    beta    1
7            A   gamma    1
2            B   alpha    0
5            B    beta    1
8            B   gamma    0
3            C   alpha    0
6            C    beta    1
9            C   gamma    1

CodePudding user response:

If the original data is 'df1', use complete after creating a column of 1s

library(tidyr)
library(dplyr)
df1 %>%
    mutate(value = 1) %>%
    complete(participants, answers, fill = list(value = 0))

-output

# A tibble: 9 × 3
  participants answers value
  <chr>        <chr>   <dbl>
1 A            alpha       1
2 A            beta        1
3 A            gamma       1
4 B            alpha       0
5 B            beta        1
6 B            gamma       0
7 C            alpha       0
8 C            beta        1
9 C            gamma       1

data

df1 <- structure(list(participants = c("A", "A", "A", "B", "C", "C"), 
    answers = c("alpha", "beta", "gamma", "beta", "beta", "gamma"
    )), class = "data.frame", row.names = c(NA, -6L))
  • Related