Home > Net >  Loop to create a a DF from values in bash
Loop to create a a DF from values in bash

Time:05-18

Im creating various text files from a file like this:

Chrom_x,Pos,Ref,Alt,RawScore,PHRED,ID,Chrom_y                                                                                                                                                
10,113934,A,C,0.18943,5.682,rs10904494,10                                                                                                                                                    
10,126070,C,T,0.030435000000000007,3.102,rs11591988,10                                                                                                                                       
10,135656,T,G,0.128584,4.732,rs10904561,10                                                                                                                                                   
10,135853,A,G,0.264891,6.755,rs7906287,10                                                                                                                                                    
10,148325,A,G,0.175257,5.4670000000000005,rs9419557,10                                                                                                                                       
10,151997,T,C,-0.21169,0.664,rs9286070,10                                                                                                                                                    
10,158202,C,T,-0.30357,0.35700000000000004,rs9419478,10                                                                                                                                      
10,158946,C,T,2.03221,19.99,rs11253562,10                                                                                                                                                    
10,159076,G,A,1.403107,15.73,rs4881551,10

What I am trying to do is extract, in bash, all values beetwen two values:

gawk '$6>=0 && $NF<=5 {print $0}' file.csv > 0_5.txt

And create files from 6 to 10, from 11 to 15... from 95 to 100. I was thinking in creating a loop for this with something like

#!/usr/bin/env bash
n=( 0,5,6,10...)
if i in n:
 gawk '$6>=n && $NF<=n 1 {print $0}' file.csv > n_n 1.txt

and so on.

How can i convert this as a loop and create files with this specific values.

CodePudding user response:

While you could use a shell loop to provide inputs to an awk script, you could also just use awk to natively split the values into buckets and write the lines to those "bucket" files itself:

awk -F, ' NR > 1 {
                i=int((($6 - 1) / 5))
                fname=(i*5) "_" (i 1)*5 ".txt"
                print $0 > fname
                  }' < input

The code skips the header line (NR > 1) and then computes a "bucket index" by dividing the value in column six by five. The filename is then constructed by multiplying that index (and its increment) by five. The whole line is then printed to that filename.

To use a shell loop (and call awk 20 times on the input), you could use something like this:

for((i=0; i <= 19; i  ))
do
  floor=$((i * 5))
  ceiling=$(( (i 1) * 5))
  awk -F, -v floor="$floor" -v ceiling="$ceiling" \
    'NR > 1 && $6 >= floor && $6 < ceiling { print }' < input \
  > "${floor}_${ceiling}.txt"
done

The basic idea is the same; here, we're creating the bucket index with the outer loop and then passing the range into awk as the floor and ceiling variables. We're only asking awk to print the matching lines; the output from awk is captured by the shell as a redirection into the appropriate file.

  • Related