Home > database >  Can I somehow refer to a column by more than one name?
Can I somehow refer to a column by more than one name?

Time:04-19

I am working with two data.frames which use different terminology. To keep the terminology of each data.frame intact, I am currently deliberating whether it would be an idea to simply add the columns to the other data.frame.

df_a <- data.frame(
        A = c("a", "b", "c"),
        B = c("a", "b", "c")
        )

df_b <- data.frame(
        same_as_A = c("a", "b", "c"),
        same_as_B = c("a", "b", "c")
        )

df_a <- cbind(df_a, df_b)
df_b <- cbind(df_b, df_a)

This will however become problematic as soon as I will start making changes to any of these columns. I was wondering if there is instead a way or even a trick, to refer to a column by more than one name. Obviously this does not work, but something like:

df_a <- data.frame(
        A & same_as_A = c("a", "b", "c"),
        B & same_as_B = c("a", "b", "c")
        )

Where df_a$same_as_A is equal to df_a$A

"a" "b" "c"

CodePudding user response:

You can derive your own superclass of data.frame, wrap [ and $, and handle aliases explicitly.

aliases <- function(x, ...) {
  dots <- list(...)
  stopifnot(!is.null(names(dots)), all(nzchar(names(dots))))
  nms <- attr(x, "aliases")
  attr(x, "aliases") <- c(nms[!names(nms) %in% names(dots)], dots)
  if (class(x)[1] != "aliased_dataframe") {
    class(x) <- c("aliased_dataframe", class(x))
  }
  x
}

`[.aliased_dataframe` <- function(x, i, j, ...) {
  if (!inherits(x, "aliased_dataframe")) NextMethod()
  if (!missing(j) && length(j)) {
    aliases <- attr(x, "aliases")
    ind <- j %in% names(aliases)
    j[ind] <- unlist(aliases[ match(j[ind], names(aliases)) ])
  }
  NextMethod(object = x)
}
`$.aliased_dataframe` <- function(x, j, ...) {
  if (!inherits(x, "aliased_dataframe")) NextMethod()
  if (!missing(j) && length(j)) {
    aliases <- attr(x, "aliases")
    ind <- j %in% names(aliases)
    j[ind] <- unlist(aliases[ match(j[ind], names(aliases)) ])
  }
  NextMethod(object = x)
}
`$<-.aliased_dataframe` <- function(x, j, ...) {
  if (!inherits(x, "aliased_dataframe")) NextMethod()
  if (!missing(j) && length(j)) {
    aliases <- attr(x, "aliases")
    ind <- j %in% names(aliases)
    j[ind] <- unlist(aliases[ match(j[ind], names(aliases)) ])
  }
  NextMethod(object = x)
}

Demo:

df_b <- data.frame(
        same_as_A = c("a", "b", "c"),
        same_as_B = c("a", "b", "c")
        )

df_b[, "a"]
# Error in `[.data.frame`(df_b, , "a") : undefined columns selected
df_b$a
# NULL
df_b <- aliases(df_b, a="same_as_A", b="same_as_B")
df_b[, "a"]
# [1] "a" "b" "c"
df_b$a
# [1] "a" "b" "c"
df_b$a <- c("A","B","C")
df_b
#   same_as_A same_as_B
# 1         A         a
# 2         B         b
# 3         C         c

Coincidentally, this works with tbl_df as well, but sadly not with data.table variants.

library(tibble) # or dplyr
df_b <- tibble(df_b)
df_b[, "a"]
# Error in `stop_subscript()`:
# ! Can't subset columns that don't exist.
# x Column `a` doesn't exist.
# Run `rlang::last_error()` to see where the error occurred.
df_b$a
# Warning: Unknown or uninitialised column: `a`.
# NULL
df_b <- aliases(df_b, a="same_as_A", b="same_as_B")
df_b[, "a"]
# # A tibble: 3 x 1
#   same_as_A
#   <chr>    
# 1 a        
# 2 b        
# 3 c        
df_b$a
# [1] "a" "b" "c"
df_b$a <- c("A","B","C")
df_b
# # A tibble: 3 x 2
#   same_as_A same_as_B
#   <chr>     <chr>    
# 1 A         a        
# 2 B         b        
# 3 C         c        

I should note that this accommodates explicit use of j=, as in df_b[,"a"]; the shortcut of df_b["a"] is technically overloading the i= argument, and while the base [.data.frame is correctly inferring your intent, these S3 wrappers are not. It is not difficult to add that (just another conditional, perhaps starting with if (missing(j) && !missing(i) && is.character(i))), but for simplicity I"m keeping it out. Because of this, df_b["a"] fails.

Another note, I did not overload [[, so df_b[["a"]] returns NULL. If it's really important to you, one could adapt this methodology to do that as well.

  •  Tags:  
  • r
  • Related