Home > Software engineering >  Splitting columns from an excel file and creating a list of elements
Splitting columns from an excel file and creating a list of elements

Time:07-02

I have an excel file of 8 columns. The number of columns can change, so I need to write something able to work with a bigger number of columns too.

A B C D E F G H
x y z k j l m n
x y z k j l m n
x y z k j l m n

I need to create a list in which each element is a new data table with the combination of different columns:

table1

A B E F
x y z k
x y z k
x y z k

table 2

C D G H
x y z k
x y z k
x y z k

Each table needs to be named with numbers, from 1 to n. With this script:

excel_file<- read_csv("~/pathway.csv")
colnumbers<-1:8
colsplits<-split(colnumbers, ceiling(colnumbers/4))
purrr::iwalk(
  colsplits,
  ~ write.csv(excel_file[, .x], paste0(.y, ".csv"))
)

Unfortunately I obtain the wrong combination. It puts together columns ABCD and then EFGH, in a new list of data table named 1 and 2, as I want. Do you know how can I obtain he combination of ABEF and CDGH, and make it work also for an initial excel of let's say, 100 columns?

CodePudding user response:

Let's solve this trick with looking at a vector first. Frankly, it does not matter what the values are, so long as we know that it is always divisible by 4, and we know that the first 2 from the front half pairs with the first 2 of the second half, etc.

Starting with length 12:

vec <- LETTERS[1:12]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
spl
#  [1] 1 1 2 2 3 3 4 4 5 5 6 6
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "G" "H"
# $`2`
# [1] "C" "D" "I" "J"
# $`3`
# [1] "E" "F" "K" "L"

Demonstration with 8 and 16:

vec <- LETTERS[1:8]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "E" "F"
# $`2`
# [1] "C" "D" "G" "H"
vec <- LETTERS[1:16]
spl <- cumsum((seq_along(vec)-1) %% 2 == 0)
split(vec, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
# [1] "A" "B" "I" "J"
# $`2`
# [1] "C" "D" "K" "L"
# $`3`
# [1] "E" "F" "M" "N"
# $`4`
# [1] "G" "H" "O" "P"

How do we use this? Relatively simple:

dat <- structure(list(A = c("x", "x", "x"), B = c("y", "y", "y"), C = c("z", "z", "z"), D = c("k", "k", "k"), E = c("j", "j", "j"), F = c("l", "l", "l"), G = c("m", "m", "m"), H = c("n", "n", "n")), class = "data.frame", row.names = c(NA, -3L))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
#   A B E F
# 1 x y j l
# 2 x y j l
# 3 x y j l
# $`2`
#   C D G H
# 1 z k m n
# 2 z k m n
# 3 z k m n

dat <- cbind(dat, data.frame(I=1:3, J=1:3, K=1:3, L=1:3))
spl <- cumsum((seq_along(dat)-1) %% 2 == 0)
split.default(dat, spl - ifelse(spl > max(spl)/2, max(spl)/2, 0))
# $`1`
#   A B G H
# 1 x y m n
# 2 x y m n
# 3 x y m n
# $`2`
#   C D I J
# 1 z k 1 1
# 2 z k 2 2
# 3 z k 3 3
# $`3`
#   E F K L
# 1 j l 1 1
# 2 j l 2 2
# 3 j l 3 3
  • Related