How to name all rows in data.frame() whilst generating the data frame?-CodePudding

I want to produce a table similar to:

But I'm having a hard time naming the rows because the GeneName should just be a label that species the table belongs to that gene.

My code for the data frame is:

geneTable <-  data.frame(presenceofvariant = c("Yes", "No"), 
                           A = c(1, 4),
                           B = c(2, 4))

I want the column label above "Yes" and "No" to be empty but this doesn't seem possible either. How do I add an overall row name - I.e the gene all this data belongs to?

CodePudding user response：

Not sure if you want this. You can use the argument row.names to supply row names, and use check.names = FALSE to have column names with special syntax.

data.frame(" " = c("Yes", "No"), 
           A = c(1, 4),
           B = c(2, 4), 
           row.names = c("GeneName", ""), 
           check.names = F)

             A B
GeneName Yes 1 2
          No 4 4

To my knowledge, it's not possible to have duplicated row names in dataframe. If you really want that, we can use matrix instead of dataframe.

matrix(c("Yes", "No", 1, 4, 2, 4), 
       nrow = 2, 
       dimnames = list(c("GeneName", "GeneName"), c("", "A", "B")))

               A   B  
GeneName "Yes" "1" "2"
GeneName "No"  "4" "4"

CodePudding user response：

When you say 'overall row name' I think the dimnames of a matrix would suit well here. A data.frame is strictly rectangular and doesn't allow this type of 'meta labeling' - instead you'd have to just add another column. So to your example, you could do this:

matrix(1:4, 
       nrow = 2, 
       dimnames = list("GeneName" = c("YES", "NO"), 
                       "categories" = c("A", "B")))
#>         categories
#> GeneName A B
#>      YES 1 3
#>      NO  2 4

^{Created on 2022-04-08 by the reprex package (v2.0.1)}

Alternatively, a 'tidy' way to do this in a data.frame might be:

data.frame(gene = c("geneA", "geneA", "geneB", "geneB"), 
           Answer = c("YES", "NO", "YES", "NO"),
           A = sample(4),
           B = sample(4))
#>    gene Answer A B
#> 1 geneA    YES 2 3
#> 2 geneA     NO 3 4
#> 3 geneB    YES 4 1
#> 4 geneB     NO 1 2

^{Created on 2022-04-08 by the reprex package (v2.0.1)}