Home > front end >  Accessing column of a data frame doesn't require the full name of that column
Accessing column of a data frame doesn't require the full name of that column

Time:09-23

Here's my issue :

df <- data.frame(xabc = c(1,2,3), yabc = c(4,5,6))

Accessing a column of this data.frame is usually performed using df$xabc. While realizing of a typo, it seemed that accessing that exact same column using df$x works as well...

My question is :

  • How is this process called ?
  • What kind of errors/mistakes could arise if a typo is to be made ?
  • If any, is there anything to do to raise an error if the column name after the $ doesn't exist ?

Thanks in advance.

CodePudding user response:

From ?`[`:

Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.

Below you can see the first and third are equivalent. The $ operator keeps exact = FALSE whereas with [ you can specify explicitly to control this behavior although the default is TRUE.

I don't know if there's a specific name for this behavior, but as @akrun comments, this could produce unpredictable and undesirable results, so it's better to enforce exact matching or if fuzzy matching is required, do so in a more explicitly controlled way.

d <- data.frame(xabc = c(1,2,3), yabc = c(4,5,6))

d$x
#> [1] 1 2 3
d[["x"]]
#> NULL
d[["x", exact = F]]
#> [1] 1 2 3
d[["x", exact = T]]
#> NULL

Created on 2022-09-22 by the reprex package (v2.0.1)

  • Related