Home > Back-end >  Splitting a tab delimited table into chunks of columns using BASH
Splitting a tab delimited table into chunks of columns using BASH

Time:11-03

Lets say I have a 100 line tab delimited file with 100 values in each line. Is there any way to split the file in such a fashion that I can get five 100 line files with 20 values in each line, essentially splitting 100 columns into chunks of 20 columns. This solution should be scalable to about 60k columns with 60 chunks (1000 columns).

I tried using split, but I quickly realized that is only for splitting along lines rather than columns.

CodePudding user response:

Well that is what cut does.

cut -f 1-20 < file > subfile1
cut -f 21-40 < file > subfile2
...

Looping through chunks

chunksize=1000
ncol
for ((col=1; col <= ncol ; col = col chunksize))
do
   cut -f $col-$((col chunksize-1)) < file > subfile_$col
done

CodePudding user response:

You can try to use rq (https://github.com/fuyuncat/rquery/releases) to split each chunk of columns into different files,

./rq -q "p d/,/ | s foreach(1,%,appendFile($ when(mod(#,5)=0 or #=@%,'\n',','),'/tmp/tmp' ceil(#/5.0) '.csv'))" samples/myfile.csv

then concatenate them to a new file.

cat /tmp/tmp*.csv > /tmp/new.csv
  • Related