Home > other >  Bug about a wrong row (or column) was selected in data.frame with character index (rowname or colnam
Bug about a wrong row (or column) was selected in data.frame with character index (rowname or colnam

Time:04-08

when I want to select a row with row names as 'CYP3A4', a different row with similar name was selected. Is it a bug for R or a default function? How can I avoid this?

CodePudding user response:

The single quotation mark does not belong to the name, it is just to indicate that you have an object of class character representing text instead of a symbol named CPY3A4.

x <- data.frame(var = seq(4))
rownames(x) <- c("foo", "'foo'", "`foo`", "\"foo\"")
x
#>       var
#> foo     1
#> 'foo'   2
#> `foo`   3
#> "foo"   4
x["foo",]
#> [1] 1
x["'foo'",]
#> [1] 2
x["`foo",]
#> [1] 3
x["\"foo\"",]
#> [1] 4

Created on 2022-04-08 by the reprex package (v2.0.0)

CodePudding user response:

What you're seeing is "partial matching" that occurs when your data is in a dataframe rather than a matrix. Compare for example the behaviour of a matrix and dataframe with:

matrix.a3 <- matrix(1:10, 2, 5, dimnames=list(c("CYP3A33", "CYP3A43"),
                                     paste0("P01",1:5)))

a3 <- data.frame(matrix.a3)
a3
#         P011 P012 P013 P014 P015
# CYP3A33    1    3    5    7    9
# CYP3A43    2    4    6    8   10

a3['CYP3A4',1:5]
#        P011 P012 P013 P014 P015
#CYP3A43    2    4    6    8   10

matrix.a3['CYP3A4',1:5]
#Error in matrix.a3["CYP3A4", 1:5] : subscript out of bounds

This is mentioned in the help file for [.data.frame saying:

Both [ and [[ extraction methods partially match row names. By default neither partially match column names, but [[ will if exact = FALSE (and with a warning if exact = NA). If you want to exact matching on row names use match, as in the examples.

The easiest solution would be to store your data in a matrix if it is all one type, but if that's not suitable you can work around by selecting with a logical index, rather than by string:

a3[rownames(a3) == "CYP3A4", 1:5]
#[1] P011 P012 P013 P014 P015
#<0 rows> (or 0-length row.names)

or by using match:

a3[match("CYP3A4", row.names(a3)), ] 
#   P011 P012 P013 P014 P015
#NA   NA   NA   NA   NA   NA

Note the different results that they give, depending on which is more useful for you

  • Related