I have a text file with this two column structure:
$ head sigScaf.txt
scaffold4471 1404
scaffold4514 61
scaffold4514 100
scaffold4514 312
scaffold4514 313
scaffold6052 12989
scaffold25893 980
scaffold31460 67
scaffold54069 553
scaffold54660 7705
I want to write a loop that generates two temporary files, one containing the contents of the first column (scaffold number) and the other containing the contents of the second column (integer), and then print the content of those files for each loop.
The script I have written looks like this:
$ cat test.sh
#!/bin/bash
for i in $(cat sigScaf.txt)
do
echo $i >sig.tmp
awk '{print $1}' sig.tmp >scaf.tmp
awk '{print $2}' sig.tmp >pos.tmp
echo "Scaffold:"
cat scaf.tmp
echo "Position:"
cat pos.tmp
done
I assumed the output would look like the following:
$ bash test.sh | head
Scaffold: scaffold4471
Position: 1404
Scaffold: scaffold4514
Position: 61
Scaffold: scaffold4514
Position: 100
Scaffold: scaffold4514
Position: 312
Scaffold: scaffold4514
Position: 313
However, the output looks like this (with extra lines to properly demonstrate the issue):
$ bash test.sh | head -50
Scaffold:
scaffold4471
Position:
Scaffold:
1404
Position:
Scaffold:
scaffold4514
Position:
Scaffold:
61
Position:
Scaffold:
scaffold4514
Position:
Scaffold:
100
Position:
Scaffold:
scaffold4514
Position:
Scaffold:
312
Position:
Scaffold:
scaffold4514
Position:
Scaffold:
313
Position:
Scaffold:
scaffold6052
Position:
Scaffold:
12989
Position:
Scaffold:
scaffold25893
It appears that the second column is instead being treated as a new line, making every second line of the loop output the integer to the temporary scaf.tmp
file and leaving the pos.tmp
file empty as there appears to be no second column for awk
to read.
I feel like the solution is right in front of me but I have tried for so long to find the source of the problem that the script has lost all meaning to me.
Does anyone know what might cause this?
Cheers
CodePudding user response:
Separate the columns using awk
awk '{print $1 > "scaf.tmp"; print $2 > "pos.tmp"}' sigScaf.txt
CodePudding user response:
GNU AWK
does have implicit loop, you might use it to get desired output without creating temporary files following way, let file.txt
content be
scaffold4471 1404
scaffold4514 61
scaffold4514 100
scaffold4514 312
scaffold4514 313
scaffold6052 12989
scaffold25893 980
scaffold31460 67
scaffold54069 553
scaffold54660 7705
then
awk '{printf "Scaffold: %s\nPosition: %s\n",$1,$2}' file.txt
gives output
Scaffold: scaffold4471
Position: 1404
Scaffold: scaffold4514
Position: 61
Scaffold: scaffold4514
Position: 100
Scaffold: scaffold4514
Position: 312
Scaffold: scaffold4514
Position: 313
Scaffold: scaffold6052
Position: 12989
Scaffold: scaffold25893
Position: 980
Scaffold: scaffold31460
Position: 67
Scaffold: scaffold54069
Position: 553
Scaffold: scaffold54660
Position: 7705
Explanation: I use printf
to format and then print string, %s
denotes place for insertion, $1
and $2
are values to insert. Observe that trailing \n
is required as unlike print
no ORS
is added by default.
(tested in GNU Awk 5.0.1)