I notice that if the row names of the dataframe follows a sequence of numbers from 1 to the number of rows. The row names of the dataframe will disappear after using as.matrix
. But the row names re-appear if the row name is not a sequence.
Here are a reproducible example:
test <- as.data.frame(list(x=c(0.1, 0.1, 1), y=c(0.1, 0.2, 0.3)))
rownames(test)
# [1] "1" "2" "3"
rownames(as.matrix(test))
# NULL
rownames(as.matrix(test[c(1, 3), ]))
# [1] "1" "3"
Does anyone have an idea on what is going on?
Thanks a lot
CodePudding user response:
First and foremost, we always have a numerical index for sub-setting that won't disappear and that we should not confuse with row names.
as.matrix(test)[c(1, 3), ]
# x y
# [1,] 0.1 0.1
# [2,] 1.0 0.3
WHAT's going on while using rownames
is the dimnames
feature in the serene source code of base:::rownames()
,
function (x, do.NULL = TRUE, prefix = "row")
{
dn <- dimnames(x)
if (!is.null(dn[[1L]]))
dn[[1L]]
else {
nr <- NROW(x)
if (do.NULL)
NULL
else if (nr > 0L)
paste0(prefix, seq_len(nr))
else character()
}
}
which yields NULL
for dimnames(as.matrix(test))[[1]]
but yields "1" "3"
in the case of dimnames(as.matrix(test[c(1, 3), ]))[[1]]
.
Note, that the method base:::row.names.data.frame
is applied in case of data frames, e.g. rownames(test)
.
The WHAT should be explained with it, fortunately you did not ask for the WHY, which would be rather opinion-based.
CodePudding user response:
I don't know exactly why it happens, but one way to fix it is to include the argument rownames.force = T, inside as.matrix
rownames(as.matrix(test, rownames.force = T))
CodePudding user response:
The difference dataframe vs. matrix:
?rownames
rownames(x, do.NULL = TRUE, prefix = "row")
The important part is do.NULL = TRUE
the default is TRUE: This means:
If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL(x)) is returned in any case,
If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as
rownames(x)[3] <- "c"
may not work unless x already has dimnames, since this will create a length-3 value from the NULL value of rownames(x).
For me that means (maybe not correct or professional) to apply rownames() function to a matrix the dimensions of the row must be declared before otherwise you will get NULL -> because this is the default setting in the function rownames().
In your example you experience this kind of behaviour: Here you declare row 1 and 3 and get 1 and 3
rownames(as.matrix(test[c(1, 3), ]))
[1] "1" "3"
Here you declare nothing and get NULL because NULL is the default.
rownames(as.matrix(test))
NULL
You can overcome this by declaring before:
rownames(test) <- 1:3
rownames(as.matrix(test))
[1] "1" "2" "3"
or you could do :
rownames(as.matrix(test), do.NULL = FALSE)
[1] "row1" "row2" "row3"
> rownames(as.matrix(test), do.NULL = FALSE, prefix="")
[1] "1" "2" "3"
Similar effect with rownames.force:
rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.
dimnames(matrix_test)
CodePudding user response:
You can enable rownames = TRUE
when you apply as.matrix
> as.matrix(test, rownames = TRUE)
x y
1 0.1 0.1
2 0.1 0.2
3 1.0 0.3