Row names disappear after as.matrix-CodePudding

I notice that if the row names of the dataframe follows a sequence of numbers from 1 to the number of rows. The row names of the dataframe will disappear after using as.matrix. But the row names re-appear if the row name is not a sequence.

Here are a reproducible example:

test <- as.data.frame(list(x=c(0.1, 0.1, 1), y=c(0.1, 0.2, 0.3)))
rownames(test)
# [1] "1" "2" "3"

rownames(as.matrix(test))
# NULL

rownames(as.matrix(test[c(1, 3), ]))
# [1] "1" "3"

Does anyone have an idea on what is going on?

Thanks a lot

CodePudding user response：

First and foremost, we always have a numerical index for sub-setting that won't disappear and that we should not confuse with row names.

as.matrix(test)[c(1, 3), ]
#        x   y
# [1,] 0.1 0.1
# [2,] 1.0 0.3

WHAT's going on while using rownames is the dimnames feature in the serene source code of base:::rownames(),

function (x, do.NULL = TRUE, prefix = "row") 
{
  dn <- dimnames(x)
  if (!is.null(dn[[1L]])) 
    dn[[1L]]
  else {
    nr <- NROW(x)
    if (do.NULL) 
      NULL
    else if (nr > 0L) 
      paste0(prefix, seq_len(nr))
    else character()
  }
}

which yields NULL for dimnames(as.matrix(test))[[1]] but yields "1" "3" in the case of dimnames(as.matrix(test[c(1, 3), ]))[[1]].

Note, that the method base:::row.names.data.frame is applied in case of data frames, e.g. rownames(test).

The WHAT should be explained with it, fortunately you did not ask for the WHY, which would be rather opinion-based.

CodePudding user response：

I don't know exactly why it happens, but one way to fix it is to include the argument rownames.force = T, inside as.matrix

rownames(as.matrix(test, rownames.force = T))

CodePudding user response：

The difference dataframe vs. matrix:

?rownames

rownames(x, do.NULL = TRUE, prefix = "row")

The important part is do.NULL = TRUE the default is TRUE: This means:

If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL(x)) is returned in any case,

If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as

rownames(x)[3] <- "c"

may not work unless x already has dimnames, since this will create a length-3 value from the NULL value of rownames(x).

For me that means (maybe not correct or professional) to apply rownames() function to a matrix the dimensions of the row must be declared before otherwise you will get NULL -> because this is the default setting in the function rownames().

In your example you experience this kind of behaviour: Here you declare row 1 and 3 and get 1 and 3

rownames(as.matrix(test[c(1, 3), ]))
[1] "1" "3"

Here you declare nothing and get NULL because NULL is the default.

rownames(as.matrix(test))
NULL

You can overcome this by declaring before:

rownames(test) <- 1:3

rownames(as.matrix(test))
[1] "1" "2" "3"

or you could do :

rownames(as.matrix(test), do.NULL = FALSE)
[1] "row1" "row2" "row3"
> rownames(as.matrix(test), do.NULL = FALSE, prefix="")
[1] "1" "2" "3"

Similar effect with rownames.force: rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame. dimnames(matrix_test)

CodePudding user response：

You can enable rownames = TRUE when you apply as.matrix

> as.matrix(test, rownames = TRUE)
    x   y
1 0.1 0.1
2 0.1 0.2
3 1.0 0.3