Splitting columns from an excel file and creating a list of elements-CodePudding

I have an excel file of 8 columns. The number of columns can change, so I need to write something able to work with a bigger number of columns too.

A	B	C	D	E	F	G	H
x	y	z	k	j	l	m	n
x	y	z	k	j	l	m	n
x	y	z	k	j	l	m	n

I need to create a list in which each element is a new data table with the combination of different columns:

table1

A	B	E	F
x	y	z	k
x	y	z	k
x	y	z	k

table 2

C	D	G	H
x	y	z	k
x	y	z	k
x	y	z	k

Each table needs to be named with numbers, from 1 to n. With this script:

excel_file<- read_csv("~/pathway.csv")
colnumbers<-1:8
colsplits<-split(colnumbers, ceiling(colnumbers/4))
purrr::iwalk(
  colsplits,
  ~ write.csv(excel_file[, .x], paste0(.y, ".csv"))
)

Unfortunately I obtain the wrong combination. It puts together columns ABCD and then EFGH, in a new list of data table named 1 and 2, as I want. Do you know how can I obtain he combination of ABEF and CDGH, and make it work also for an initial excel of let's say, 100 columns?

CodePudding user response：

Let's solve this trick with looking at a vector first. Frankly, it does not matter what the values are, so long as we know that it is always divisible by 4, and we know that the first 2 from the front half pairs with the first 2 of the second half, etc.

Starting with length 12:

vec <- LETTERS[1:12]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
spl
#  [1] 1 1 2 2 3 3 4 4 5 5 6 6
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "G" "H"
# $`2`
# [1] "C" "D" "I" "J"
# $`3`
# [1] "E" "F" "K" "L"

Demonstration with 8 and 16:

vec <- LETTERS[1:8]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "E" "F"
# $`2`
# [1] "C" "D" "G" "H"
vec <- LETTERS[1:16]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "I" "J"
# $`2`
# [1] "C" "D" "K" "L"
# $`3`
# [1] "E" "F" "M" "N"
# $`4`
# [1] "G" "H" "O" "P"

How do we use this? Relatively simple:

dat <- structure(list(A = c("x", "x", "x"), B = c("y", "y", "y"), C = c("z", "z", "z"), D = c("k", "k", "k"), E = c("j", "j", "j"), F = c("l", "l", "l"), G = c("m", "m", "m"), H = c("n", "n", "n")), class = "data.frame", row.names = c(NA, -3L))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
#   A B E F
# 1 x y j l
# 2 x y j l
# 3 x y j l
# $`2`
#   C D G H
# 1 z k m n
# 2 z k m n
# 3 z k m n

dat <- cbind(dat, data.frame(I=1:3, J=1:3, K=1:3, L=1:3))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
#   A B G H
# 1 x y m n
# 2 x y m n
# 3 x y m n
# $`2`
#   C D I J
# 1 z k 1 1
# 2 z k 2 2
# 3 z k 3 3
# $`3`
#   E F K L
# 1 j l 1 1
# 2 j l 2 2
# 3 j l 3 3