Home > OS >  Transform dataframe to binary dataframe in R
Transform dataframe to binary dataframe in R

Time:07-03

I have a dataframe such as

Groups Names
G1     SP1
G1     SP2
G1     SP3
G2     SP1
G2     SP4
G3     SP2
G3     SP1 

And I would like to transform it as :

  Names G1 G2 G3
  SP1   1  1  1
  SP2   1  0  1  
  SP3   1  0  0 
  SP4   0  1  0

Where in columns are the Groups and within cell 1 = present and 0 = absent

Here is the dput format

structure(list(Groups = c("G1", "G1", "G1", "G2", "G2", "G3", 
"G3"), Names = c("SP1", "SP2", "SP3", "SP1", "SP4", "SP2", "SP1"
)), class = "data.frame", row.names = c(NA, -7L))

CodePudding user response:

Use table:

table(df$Names, df$Groups)
     
      G1 G2 G3
  SP1  1  1  1
  SP2  1  0  1
  SP3  1  0  0
  SP4  0  1  0

CodePudding user response:

Expanding comment to an answer.

This is known as a contingency table, and can be computed in several ways, without using fancy packages.

dat <- structure(list(Groups = c("G1", "G1", "G1", "G2", "G2", "G3", 
"G3"), Names = c("SP1", "SP2", "SP3", "SP1", "SP4", "SP2", "SP1"
)), class = "data.frame", row.names = c(NA, -7L))

mat1 <- with(dat, table(Names, Groups))
#     Groups
#Names G1 G2 G3
#  SP1  1  1  1
#  SP2  1  0  1
#  SP3  1  0  0
#  SP4  0  1  0

mat2 <- xtabs(~ Names   Groups, dat)
#     Groups
#Names G1 G2 G3
#  SP1  1  1  1
#  SP2  1  0  1
#  SP3  1  0  0
#  SP4  0  1  0

Such table is a matrix. If you want a data frame, coerce them using:

data.frame(unclass(mat1))
#    G1 G2 G3
#SP1  1  1  1
#SP2  1  0  1
#SP3  1  0  0
#SP4  0  1  0

data.frame(unclass(mat2))
#    G1 G2 G3
#SP1  1  1  1
#SP2  1  0  1
#SP3  1  0  0
#SP4  0  1  0

Remark:

In your case, your data frame should have no duplicated rows, otherwise a contingency table won't just contain 0 and 1. In this sense, computing a contingency table actually overkills. An algorithmically simpler way (although with more lines of code) is:

m1 <- unique(dat$Names)
m2 <- unique(dat$Groups)
mat <- matrix(0, length(m1), length(m2), dimnames = list(m1, m2))
mat[with(dat, cbind(Names, Groups))] <- 1
#    G1 G2 G3
#SP1  1  1  1
#SP2  1  0  1
#SP3  1  0  0
#SP4  0  1  0

CodePudding user response:

You can use table over df either by

> t(table(df))
     Groups
Names G1 G2 G3
  SP1  1  1  1
  SP2  1  0  1
  SP3  1  0  0
  SP4  0  1  0

or

> table(rev(df))
     Groups
Names G1 G2 G3
  SP1  1  1  1
  SP2  1  0  1
  SP3  1  0  0
  SP4  0  1  0
  • Related