I have workdir where many datasets (C:/1/Datasets). As example i provide only three of them.
d1=structure(list(x = c(2L, 3L, 4L, 6L, 5L, 5L), class = c(1L, 1L,
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))
d2=structure(list(x = c(2L, 6L, 5L, 4L, 8L, 6L), class = c(1L, 1L,
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))
d3=structure(list(x = c(5L, 3L, 4L, 4L, 9L, 6L), class = c(1L, 1L,
1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -6L
))
We see that here in each dataset there is a metric variable x
and a dependent variable class
, it has 2 categories 1 and 2 . I want to recombine the datasets, that are in the working folder with each other.
For example, we combine dataset d1
with datasets d2 and d3
. With such a union, all categories of the variable class
of dataset d1
should set 1, and the categories of the variable class of dataset d2 and d
3 should become zero
like this
x class dataset
2 1 d1
3 1 d1
4 1 d1
6 1 d1
5 1 d1
5 1 d1
2 0 d2
6 0 d2
5 0 d2
4 0 d2
8 0 d2
6 0 d2
5 0 d3
3 0 d3
4 0 d3
4 0 d3
9 0 d3
6 0 d3
after that, we change the join order. d2
, combine with d1 and d3
where all values of the class
variable of dataset d2
become 1, and all values of the class variable of datasets d1 and d3
become zero.
in a similar way, recombine all datasets that are in the working folder. For example if there are 4 datasets then it will look something like this
1 vs 2-3-4
2 vs 1-3-4
3 vs 1-2-4
4 vs 1-2-3
and each time, the categories class
variable of the main dataset set as one, and the category of class variable for the attached datasets becomes zero. as result we have 4 separate datasets. (depends on the number of datasets in the working directory)
How to make such a recombination? any help is appreciated for me. Thank you.
CodePudding user response:
Here's a function that will do it. Just put the dataset where you want class=1
first.
comb_data <- function(...){
nms <- as.list(substitute(list(...)))[-1L]
nms <- sapply(nms, as.character)
dats <- list(...)
dats[[1]]$class <- rep(1, nrow(dats[[1]]))
dats[[1]]$dataset <- nms[1]
for(i in 2:length(dats)){
dats[[i]]$class <- rep(0, nrow(dats[[i]]))
dats[[i]]$dataset <- nms[[i]]
}
do.call(rbind, dats)
}
comb_data(d1, d2, d3)
# x class dataset
# 1 2 1 d1
# 2 3 1 d1
# 3 4 1 d1
# 4 6 1 d1
# 5 5 1 d1
# 6 5 1 d1
# 7 2 0 d2
# 8 6 0 d2
# 9 5 0 d2
# 10 4 0 d2
# 11 8 0 d2
# 12 6 0 d2
# 13 5 0 d3
# 14 3 0 d3
# 15 4 0 d3
# 16 4 0 d3
# 17 9 0 d3
# 18 6 0 d3
CodePudding user response:
Here is another way using base R. It will use a named list of all the data.frames which you want to combine and the name of the data.frame which you want to evaluate:
list2combine = list("d1"=d1, "d2"=d2, "d3"=d3)
combineData <- function(main="d1", list2combine){
new <- do.call(rbind, list2combine)
new$dataset <- sapply(strsplit(rownames(new),"[.]"), `[`, 1)
rownames(new) <- NULL
new$class <- ifelse(new$dataset == main, 1, 0)
return(new)
}
To iterate through all the elements in the list I would suggest using lapply which returns a list with all data.frames:
result <- lapply(names(list2combine), function(x) { combineData(x, list2combine)})