Home > Blockchain >  What are the typical use-cases for "attributes" and "classes" in R?
What are the typical use-cases for "attributes" and "classes" in R?

Time:09-30

From Advanced R section 3.3:

You might have noticed that the set of atomic vectors does not include a number of important data structures like matrices, arrays, factors, or date-times. These types are built on top of atomic vectors by adding attributes.

From Advanced R section 3.4:

One of the most important vector attributes is class, which underlies the S3 object system. Having a class attribute turns an object into an S3 object, which means it will behave differently from a regular vector when passed to a generic function. Every S3 object is built on top of a base type, and often stores additional information in other attributes.

Thus, I would say, "class" is more than "attribute". On the other hand, it seems data.frame can serve as an attribute but not really as class. What are the typical use cases then? (I could look at existing code, of course. But this feels more like bad reverse engineering than "know what you do")

Example: Should I use

mydata <- c(1:10)
attr(mydata, "x") <- "x-attribute"
attributes(mydata)
# $x
# [1] "x-attribute"

or

mydata <- c(1:10)
class(mydata) <- "x"
class(mydata)
# [1] "x"
attributes(mydata)
# $class
# [1] "x"

In the latter case,

print.x <- function(x) print.default(paste0("This is ", paste(x, collapse = " / ")))
plot.x <- function(x) plot.default(rep(1, length(x)), x, xlab = "great", main = "x")
print(mydata)
# [1] "This is 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10"
plot(mydata)

works as expected. But which is "better"?

A data.frame attribute seems not useful:

mydata <- c(1:10)
attr(mydata, "x") <- "x-attribute"
attr(mydata, "y") <- data.frame(x=c(1, 2), y=c(3, 4))
attributes(mydata)
# $x
# [1] "x-attribute"
# 
# $y
# x y
# 1 1 3
# 2 2 4

(Can't make print.??? or plot.??? work...)

CodePudding user response:

The "class" attribute is what determines generic method dispatch. A data frame has the "class" attribute set to the string "data.frame", which is what allows generic functions like format, print and even mathematical operators to treat it differently from, say, a numeric vector.

Attributes other than class are used in different ways, often to store information that the class needs to work properly but can be hidden from the user when not required.

For example, a grouped tibble has a vector of row.names stored as an attribute, the column names stored as a character vector inside an attribute, and the groups stored inside a tibble as an attribute called groups. The latter is a good example of a data frame that is stored as an attribute and does useful work.

as_tibble(iris) %>% group_by(Species) %>% attr('groups')
#> # A tibble: 3 x 2
#>   Species          .rows
#>   <fct>      <list<int>>
#> 1 setosa            [50]
#> 2 versicolor        [50]
#> 3 virginica         [50]

This attribute is normally hidden from end-users, because it is not printed by the tibble's print method, but it is vital for a grouped tibble to work as it should.

It is easy to set up an example to show how class and other attributes can be used in a single object. Suppose we wanted to be able to time stamp an object on its creation so that it keeps a record of its creation date, but we don't want that to be visible to the user unless they specifically ask for it.

First, we have a class creator (this may be the optimal way to create a class, incidentally, rather than simply setting a class attribute)

timestamp <- function(x) {
  structure(x, created_on = Sys.Date(), class = c('timestamp', class(x)))
}

Now we create a print method that doesn't display the timestamp.

print.timestamp <- function(x) {
  attr(x, 'created_on') <- NULL
  class(x) <- class(x)[-1]
  NextMethod()
}

Finally, we can create a little generic function to check timestamps on created objects. If the object is not timestamped, this should return an error:

creation_date <- function(x) UseMethod('creation_date')

creation_date.default <- function(x) stop('No timestamp on this object')

creation_date.timestamp <- function(x) {
  attr(x, 'created_on')
}

So testing now, we can create a timestamped object, which looks exactly like a non-timestamped object:

object <- timestamp(1:10)

object
#> [1]  1  2  3  4  5  6  7  8  9 10

But under the radar it contains other useful information that the object might need to function as expected:

creation_date(object)
#> [1] "2022-09-30"
  • Related