How to recombine datasets to make them binary in R-CodePudding

I have workdir where many datasets (C:/1/Datasets). As example i provide only three of them.

d1=structure(list(x = c(2L, 3L, 4L, 6L, 5L, 5L), class = c(1L, 1L, 
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))

d2=structure(list(x = c(2L, 6L, 5L, 4L, 8L, 6L), class = c(1L, 1L, 
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))

d3=structure(list(x = c(5L, 3L, 4L, 4L, 9L, 6L), class = c(1L, 1L, 
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))

We see that here in each dataset there is a metric variable x and a dependent variable class, it has 2 categories 1 and 2 . I want to recombine the datasets, that are in the working folder with each other. For example, we combine dataset d1 with datasets d2 and d3. With such a union, all categories of the variable class of dataset d1 should set 1, and the categories of the variable class of dataset d2 and d3 should become zero like this

x   class   dataset
2   1   d1
3   1   d1
4   1   d1
6   1   d1
5   1   d1
5   1   d1
2   0   d2
6   0   d2
5   0   d2
4   0   d2
8   0   d2
6   0   d2
5   0   d3
3   0   d3
4   0   d3
4   0   d3
9   0   d3
6   0   d3

after that, we change the join order. d2, combine with d1 and d3 where all values of the class variable of dataset d2 become 1, and all values of the class variable of datasets d1 and d3 become zero.

in a similar way, recombine all datasets that are in the working folder. For example if there are 4 datasets then it will look something like this

1 vs 2-3-4
2 vs 1-3-4
3 vs 1-2-4
4 vs 1-2-3

and each time, the categories class variable of the main dataset set as one, and the category of class variable for the attached datasets becomes zero. as result we have 4 separate datasets. (depends on the number of datasets in the working directory)

How to make such a recombination? any help is appreciated for me. Thank you.

CodePudding user response：

Here's a function that will do it. Just put the dataset where you want class=1 first.

comb_data <- function(...){
    nms <- as.list(substitute(list(...)))[-1L]
    nms <- sapply(nms, as.character)
    dats <- list(...)
    dats[[1]]$class <- rep(1, nrow(dats[[1]]))
    dats[[1]]$dataset <- nms[1]
    for(i in 2:length(dats)){
        dats[[i]]$class <- rep(0, nrow(dats[[i]]))
        dats[[i]]$dataset <- nms[[i]]
    }
    do.call(rbind, dats)
}

comb_data(d1, d2, d3)
#    x class dataset
# 1  2     1      d1
# 2  3     1      d1
# 3  4     1      d1
# 4  6     1      d1
# 5  5     1      d1
# 6  5     1      d1
# 7  2     0      d2
# 8  6     0      d2
# 9  5     0      d2
# 10 4     0      d2
# 11 8     0      d2
# 12 6     0      d2
# 13 5     0      d3
# 14 3     0      d3
# 15 4     0      d3
# 16 4     0      d3
# 17 9     0      d3
# 18 6     0      d3

CodePudding user response：

Here is another way using base R. It will use a named list of all the data.frames which you want to combine and the name of the data.frame which you want to evaluate:

list2combine = list("d1"=d1, "d2"=d2, "d3"=d3)

combineData <- function(main="d1", list2combine){
  new <- do.call(rbind, list2combine)
  new$dataset <- sapply(strsplit(rownames(new),"[.]"), `[`, 1) 
  rownames(new) <- NULL
  new$class <- ifelse(new$dataset == main, 1, 0)
  return(new)
}

To iterate through all the elements in the list I would suggest using lapply which returns a list with all data.frames:

result <- lapply(names(list2combine), function(x) { combineData(x, list2combine)})