Home > Software engineering >  Problem with column names in a function in R when running linear discriminant analysis (lda)
Problem with column names in a function in R when running linear discriminant analysis (lda)

Time:11-29

This is my dataset of example, where column D is the factor one.

df <- data.frame(A=1:10, B=2:11, C=3:12, D="A")
df[6:10, 4] <- "B"

When I run a lda, it works nice:

model <- lda(D ~  B   C, data = df)

print(model)

Call:
lda(D ~ B   C, data = df)

Prior probabilities of groups:
  A   B 
0.5 0.5 

Group means:
  B  C
A 4  5
B 9 10

Coefficients of linear discriminants:
        LD1
B 0.3162278
C 0.3162278

However, when I try to replicate this into a function, I get stucked.

fun1 <- function(x, column){
  model <- lda(column ~  B   C, data = x)
  print(model)
}

I tried several options:

  • With quotes -> fun1(df, "D")
  • Without quotes -> fun1(df, D)

In both cases I get errors that are displayed below:

# fun1(df, "D")
Error in model.frame.default(formula = column ~ B   C, data = x) : 
variable lengths differ (found for 'B')

# fun1(df, D)
Error in model.frame.default(formula = column ~ B   C, data = x) : 
object is not a matrix

What am I doing wrong?

CodePudding user response:

column in your model formula is treated as a variable name that is expected in the supplied data and it will not be evaluated to whatever you set the column variable to (e.g. 'D'). You have to explicitly build the formula so that this evaluation is performed, e.g. using reformulate:

fun1 <- function(x, column){
  form <- reformulate(c('B', 'C'), column)
  model <- lda(form, data = x)
  print(model)
}

fun1(df, 'D')
  • Related