Home > OS >  Why won't R recognize data frame column names within lists?
Why won't R recognize data frame column names within lists?

Time:01-03

HEADLINE: Is there a way to get R to recognize data.frame column names contained within lists in the same way that it can recognize free-floating vectors?

SETUP: Say I have a vector named varA:

(varA <- 1:6)
# [1] 1 2 3 4 5 6

To get the length of varA, I could do:

length(varA)
#[1] 6

and if the variable was contained within a larger list, the variable and its length could still be found by doing:

list <- list(vars = "varA")
length(get(list$vars[1]))
#[1] 6

PROBLEM: This is not the case when I substitute the vector for a dataframe column and I don't know how to work around this:

rows <- 1:6
cols <- c("colA")
(df <- data.frame(matrix(NA, 
                         nrow = length(rows), 
                         ncol = length(cols), 
                         dimnames = list(rows, cols))))
#   colA
# 1   NA
# 2   NA
# 3   NA
# 4   NA
# 5   NA
# 6   NA

list <- list(vars = "varA", 
             cols = "df$colA")
length(get(list$vars[1]))
#[1] 6
length(get(list$cols[1]))
#Error in get(list$cols[1]) : object 'df$colA' not found

Though this contrived example seems inane, because I could always use the simple length(variable) approach, I'm actually interested in writing data from hundreds of variables varying in lengths onto respective dataframe columns, and so keeping them in a list that I could iterate through would be very helpful. I've tried everything I could think of, but it may be the case that it's just not possible in R, especially given that I cannot find any posts with solutions to the issue.

CodePudding user response:

You could try:

> length(eval(parse(text = list$cols[1])))
[1] 6

Or:

list <- list(vars = "varA", 
             cols = "colA")

length(df[, list$cols[1]])
[1] 6

Or with regex:

list <- list(vars = "varA", 
             cols = "df$colA")
length(df[, sub(".*\\$", "", list$cols[1])])
[1] 6

CodePudding user response:

If you are truly working with a data frame d, then nrow(d) is the length of all of the variables in d. There should be no reason to use length in this case.

If you are actually working with a list x containing variables of potentially different lengths, then you should use the [[ operator to extract those variables by name (see ?Extract):

x <- list(a = 1:10, b = rnorm(20L))
l <- list(vars = "a")
length(d[[l$vars[1L]]]) # 10

If you insist on using get (you shouldn't), then you need to supply a second argument telling it where to look for the variable (see ?get):

length(get(l$vars[1L], x)) # 10
  • Related