Combine all possible pairs in one dataframe in R-CodePudding

I am trying to create a loop where i can get all possible combinations in one data frame. The reason for this is i want to use those pairs later to create lm() and adf.test() later on. As an example i have a data frame as follows: df <- as.data.frame(cbind(1, 2, 3, 4)).

From this i want to get all possible combinations as such: pairs <- as.data.frame(cbind(c(1, 1, 1, 2, 2, 3), c(2,3,4,3,4,4))).

To accomplish this i have tried several combinations of an for loop similar to this:

all_pairs = matrix(0, ((length(df)*(length(df)-1))/2), 2)
for (ij in 1:((length(df)*(length(df)-1))/2)) {
  for (i in 1:(length(df)-1)) {
    for (j in (i 1):length(df)) {
      all_pairs[ij, 1] = df[i,]
      all_pairs[ij,2] = df[j,]
    }
  }
}

The reason for ((length(df)*(length(df)-1))/2), is then that comb=n(n-1)/2 is how i would calculate all combinations without replacement.

As earlier mentioned, i have tried a few ways of doing this, but none of them works. Is this a good method for accomplishing my goal? if yes, how could i make it work?

Thanks in advance!

CodePudding user response：

Use combn with m = 2 to get pairs of combinations:

data.frame(t(combn(1:4, m = 2)))
  X1 X2
1  1  2
2  1  3
3  1  4
4  2  3
5  2  4
6  3  4

CodePudding user response：

Try combn

> as.data.frame(t(combn(df, 2)))
  V1 V2
1  1  2
2  1  3
3  1  4
4  2  3
5  2  4
6  3  4

CodePudding user response：

From what I understand, you're trying to create possible combinations of all predictor variables and then develop a linear regression model. I wrote this function few days ago, maybe you should be able to reuse it:

To get started, x denotes all the predictor variables and y denotes the target variables. This will provide you a table and with all combinations of predictor variables along with their error metrics (RMSE, MAE, MSE etc.)

LinearRegressionDA <- function(y, x, DatasetName,Split_Ratio=0.75) {

set.seed(12334)
split = sample.split(DatasetName, SplitRatio = Split_Ratio)
train = subset(DatasetName, split=="TRUE")
test = subset(DatasetName, split=="FALSE")

Data_list =do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))

Data_dataframe = data.frame(stringi::stri_list2matrix(
  do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
  byrow = TRUE
))
Data_dataframe[is.na(Data_dataframe)] <- ""

RMSE = list()
MAE = list()
Adj_R2 = list()
R2 = list()

for (i in 1:length(Data_list)){
    
model = lm(as.formula((paste(y,"~", paste(Data_list[[i]], collapse = " ")))), data = train)
predictions <- model %>% predict(test)

# Model performance

RMSE_ = MLmetrics::RMSE(predictions, test[,y])
RMSE = append(RMSE, RMSE_)
    
MAE_ = MAE(predictions, test[,y])
MAE = append(MAE, MAE_)
    
Adj_R2_ = summary(model)$adj.r.squared
Adj_R2 = append(Adj_R2, Adj_R2_)
    
R2_ = summary(model)$r.squared
R2 = append(R2, R2_)

}

Data_dataframe$RMSE = round(unlist(RMSE),3)
Data_dataframe$MAE = round(unlist(MAE),5)
Data_dataframe$Adj_R2 = round(unlist(Adj_R2),3)
Data_dataframe$R2 = round(unlist(R2),3)
    
list(Data_dataframe %>%arrange(desc(R2)))
}

You can use this function in the following manner:

LinearRegressionDA(y = "Y1", x = c("X1" ,"X2", "X3","X4"), DatasetName = df)[[1]]