Home > Software design >  Why is my for loop treating each column as a separate line?
Why is my for loop treating each column as a separate line?

Time:01-14

I have a text file with this two column structure:

$ head sigScaf.txt
scaffold4471 1404
scaffold4514 61
scaffold4514 100
scaffold4514 312
scaffold4514 313
scaffold6052 12989
scaffold25893 980
scaffold31460 67
scaffold54069 553
scaffold54660 7705

I want to write a loop that generates two temporary files, one containing the contents of the first column (scaffold number) and the other containing the contents of the second column (integer), and then print the content of those files for each loop.

The script I have written looks like this:

$ cat test.sh

#!/bin/bash

for i in $(cat sigScaf.txt)
do

echo $i >sig.tmp

awk '{print $1}' sig.tmp >scaf.tmp
awk '{print $2}' sig.tmp >pos.tmp

echo "Scaffold:"
cat scaf.tmp

echo "Position:"
cat pos.tmp

done

I assumed the output would look like the following:

$ bash test.sh | head
Scaffold: scaffold4471
Position: 1404
Scaffold: scaffold4514
Position: 61
Scaffold: scaffold4514
Position: 100
Scaffold: scaffold4514
Position: 312
Scaffold: scaffold4514
Position: 313

However, the output looks like this (with extra lines to properly demonstrate the issue):

$ bash test.sh | head -50
Scaffold:
scaffold4471
Position:

Scaffold:
1404
Position:

Scaffold:
scaffold4514
Position:

Scaffold:
61
Position:

Scaffold:
scaffold4514
Position:

Scaffold:
100
Position:

Scaffold:
scaffold4514
Position:

Scaffold:
312
Position:

Scaffold:
scaffold4514
Position:

Scaffold:
313
Position:

Scaffold:
scaffold6052
Position:

Scaffold:
12989
Position:

Scaffold:
scaffold25893

It appears that the second column is instead being treated as a new line, making every second line of the loop output the integer to the temporary scaf.tmp file and leaving the pos.tmp file empty as there appears to be no second column for awk to read.

I feel like the solution is right in front of me but I have tried for so long to find the source of the problem that the script has lost all meaning to me.

Does anyone know what might cause this?

Cheers

CodePudding user response:

Separate the columns using awk

awk '{print $1 > "scaf.tmp"; print $2 > "pos.tmp"}' sigScaf.txt

CodePudding user response:

GNU AWK does have implicit loop, you might use it to get desired output without creating temporary files following way, let file.txt content be

scaffold4471 1404
scaffold4514 61
scaffold4514 100
scaffold4514 312
scaffold4514 313
scaffold6052 12989
scaffold25893 980
scaffold31460 67
scaffold54069 553
scaffold54660 7705

then

awk '{printf "Scaffold: %s\nPosition: %s\n",$1,$2}' file.txt

gives output

Scaffold: scaffold4471
Position: 1404
Scaffold: scaffold4514
Position: 61
Scaffold: scaffold4514
Position: 100
Scaffold: scaffold4514
Position: 312
Scaffold: scaffold4514
Position: 313
Scaffold: scaffold6052
Position: 12989
Scaffold: scaffold25893
Position: 980
Scaffold: scaffold31460
Position: 67
Scaffold: scaffold54069
Position: 553
Scaffold: scaffold54660
Position: 7705

Explanation: I use printf to format and then print string, %s denotes place for insertion, $1 and $2 are values to insert. Observe that trailing \n is required as unlike print no ORS is added by default.

(tested in GNU Awk 5.0.1)

  • Related