Lets say I have a 100 line tab delimited file with 100 values in each line. Is there any way to split the file in such a fashion that I can get five 100 line files with 20 values in each line, essentially splitting 100 columns into chunks of 20 columns. This solution should be scalable to about 60k columns with 60 chunks (1000 columns).
I tried using split, but I quickly realized that is only for splitting along lines rather than columns.
CodePudding user response:
Well that is what cut
does.
cut -f 1-20 < file > subfile1
cut -f 21-40 < file > subfile2
...
Looping through chunks
chunksize=1000
ncol
for ((col=1; col <= ncol ; col = col chunksize))
do
cut -f $col-$((col chunksize-1)) < file > subfile_$col
done
CodePudding user response:
You can try to use rq
(https://github.com/fuyuncat/rquery/releases) to split each chunk of columns into different files,
./rq -q "p d/,/ | s foreach(1,%,appendFile($ when(mod(#,5)=0 or #=@%,'\n',','),'/tmp/tmp' ceil(#/5.0) '.csv'))" samples/myfile.csv
then concatenate them to a new file.
cat /tmp/tmp*.csv > /tmp/new.csv