when I want to select a row with row names as 'CYP3A4', a different row with similar name was selected. Is it a bug for R or a default function? How can I avoid this?
CodePudding user response:
The single quotation mark does not belong to the name, it is just to indicate that you have an object of class character representing text instead of a symbol named CPY3A4
.
x <- data.frame(var = seq(4))
rownames(x) <- c("foo", "'foo'", "`foo`", "\"foo\"")
x
#> var
#> foo 1
#> 'foo' 2
#> `foo` 3
#> "foo" 4
x["foo",]
#> [1] 1
x["'foo'",]
#> [1] 2
x["`foo",]
#> [1] 3
x["\"foo\"",]
#> [1] 4
Created on 2022-04-08 by the reprex package (v2.0.0)
CodePudding user response:
What you're seeing is "partial matching" that occurs when your data is in a dataframe rather than a matrix. Compare for example the behaviour of a matrix and dataframe with:
matrix.a3 <- matrix(1:10, 2, 5, dimnames=list(c("CYP3A33", "CYP3A43"),
paste0("P01",1:5)))
a3 <- data.frame(matrix.a3)
a3
# P011 P012 P013 P014 P015
# CYP3A33 1 3 5 7 9
# CYP3A43 2 4 6 8 10
a3['CYP3A4',1:5]
# P011 P012 P013 P014 P015
#CYP3A43 2 4 6 8 10
matrix.a3['CYP3A4',1:5]
#Error in matrix.a3["CYP3A4", 1:5] : subscript out of bounds
This is mentioned in the help file for [.data.frame
saying:
Both [ and [[ extraction methods partially match row names. By default neither partially match column names, but [[ will if exact = FALSE (and with a warning if exact = NA). If you want to exact matching on row names use match, as in the examples.
The easiest solution would be to store your data in a matrix if it is all one type, but if that's not suitable you can work around by selecting with a logical index, rather than by string:
a3[rownames(a3) == "CYP3A4", 1:5]
#[1] P011 P012 P013 P014 P015
#<0 rows> (or 0-length row.names)
or by using match
:
a3[match("CYP3A4", row.names(a3)), ]
# P011 P012 P013 P014 P015
#NA NA NA NA NA NA
Note the different results that they give, depending on which is more useful for you