I have an excel file of 8 columns. The number of columns can change, so I need to write something able to work with a bigger number of columns too.
A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|
x | y | z | k | j | l | m | n |
x | y | z | k | j | l | m | n |
x | y | z | k | j | l | m | n |
I need to create a list in which each element is a new data table with the combination of different columns:
table1
A | B | E | F |
---|---|---|---|
x | y | z | k |
x | y | z | k |
x | y | z | k |
table 2
C | D | G | H |
---|---|---|---|
x | y | z | k |
x | y | z | k |
x | y | z | k |
Each table needs to be named with numbers, from 1 to n. With this script:
excel_file<- read_csv("~/pathway.csv")
colnumbers<-1:8
colsplits<-split(colnumbers, ceiling(colnumbers/4))
purrr::iwalk(
colsplits,
~ write.csv(excel_file[, .x], paste0(.y, ".csv"))
)
Unfortunately I obtain the wrong combination. It puts together columns ABCD and then EFGH, in a new list of data table named 1 and 2, as I want. Do you know how can I obtain he combination of ABEF and CDGH, and make it work also for an initial excel of let's say, 100 columns?
CodePudding user response:
Let's solve this trick with looking at a vector first. Frankly, it does not matter what the values are, so long as we know that it is always divisible by 4, and we know that the first 2 from the front half pairs with the first 2 of the second half, etc.
Starting with length 12:
vec <- LETTERS[1:12]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
spl
# [1] 1 1 2 2 3 3 4 4 5 5 6 6
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "G" "H"
# $`2`
# [1] "C" "D" "I" "J"
# $`3`
# [1] "E" "F" "K" "L"
Demonstration with 8 and 16:
vec <- LETTERS[1:8]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "E" "F"
# $`2`
# [1] "C" "D" "G" "H"
vec <- LETTERS[1:16]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "I" "J"
# $`2`
# [1] "C" "D" "K" "L"
# $`3`
# [1] "E" "F" "M" "N"
# $`4`
# [1] "G" "H" "O" "P"
How do we use this? Relatively simple:
dat <- structure(list(A = c("x", "x", "x"), B = c("y", "y", "y"), C = c("z", "z", "z"), D = c("k", "k", "k"), E = c("j", "j", "j"), F = c("l", "l", "l"), G = c("m", "m", "m"), H = c("n", "n", "n")), class = "data.frame", row.names = c(NA, -3L))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# A B E F
# 1 x y j l
# 2 x y j l
# 3 x y j l
# $`2`
# C D G H
# 1 z k m n
# 2 z k m n
# 3 z k m n
dat <- cbind(dat, data.frame(I=1:3, J=1:3, K=1:3, L=1:3))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# A B G H
# 1 x y m n
# 2 x y m n
# 3 x y m n
# $`2`
# C D I J
# 1 z k 1 1
# 2 z k 2 2
# 3 z k 3 3
# $`3`
# E F K L
# 1 j l 1 1
# 2 j l 2 2
# 3 j l 3 3