In several files, I would like to extract the lines (with their number)
- which contain the ClNonZ pattern
- and which have the value "real" as first attribute.
for a unitary file, I get the line feed respect.
but I have several files, so I make a "for" loop, and then the multiple occurrences of a file are presented without linefeed
Exemple :
$ cat foo1.txt
A TEST 0.959660297 0 0.021231423 -0.0073 -0.0031 MhZisp
B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
C REAL 0.98089172 0 0 -0.0158 0.0124 MhNonZ
D TEST 0.704883227 0.265392781 0.010615711 -0.0087 -0.0092 MhZisp
E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
F TEST 0.704883227 0.265392781 0.010458211 0.0865 0.0548 ClNonZ
$ cat foo2.txt
A TEST 0.715498938 0 0.265392781 -0.0013 -0.0309 Unkn
B REAL 0.927813163 0 0.053078556 -0.0051 -0.0636 MhZisp
C TEST 0.55626327 0.222929936 0.201698514 0.0053 -0.0438 MhZisp
D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ
E REAL 0.704883227 0.265392781 0.010615711 0.0476 0.0061 AbbbbZ
F REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
grep without loop : result ok for me, with line break :
$ grep -n ClNonZ foo1.txt | awk '$2 == "REAL" {print $0}'
2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
grep in a for loop : bad presentation, line breaks have disappeared :
$ for file in `ls foo*` ; do line=`grep -n ClNonZ $file | awk '$2 == "REAL" {print $0}' `; if [[ -n "$line" ]]; then echo $file ; echo $line ; echo " " ; fi ; done
foo1.txt
2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ 5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
foo2.txt
4:D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ 6:F REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
I tried to used "while" instead of "for" (as explained in http://mywiki.wooledge.org/BashFAQ/001 as suggested by @chepner) without success.
would you have an idea that could help me, please ?
CodePudding user response:
The primary problem here is that you didn't double-quote your variable references, especially in echo $line
(should be echo "$line"
). This often causes problems like this. See "I just assigned a variable, but echo $variable
shows something else" and "When should I double-quote a parameter expansion?" (short answer: almost always).
Shellcheck.net is good at pointing out common mistakes like this, and will also have some other good recommendations for your code. I recommended using it!
However, in this case, I'd be tempted to replace the entire bash grep awk thing, since awk can do it all itself:
awk 'FNR==1 {needheader=1}; ($0 ~ /ClNonZ/ && $2 == "REAL") {if (needheader) {print ""; print FILENAME; needheader=0}; print}' foo*.txt
Explanation:
FNR==1 {needheader=1}
-- this triggers at the beginning of each file (FNR
is the line number within the current file, so if it's 1 this is the beginning of a file) and sets a variable saying that if there's a match, the filename needs to be printed.($0 ~ /ClNonZ/ && $2 == "REAL")
-- if "ClNonZ" appears in the line, and the second field is "REAL", then do the following stuff in{ }
. Note: do you actually want to search the entire line for "ClNonZ", or just the last field? If it's just the last field, use$NF == "ClNonZ"
)if (needheader) {print ""; print FILENAME; needheader=0}
-- if this is the first match within this file, print a blank line and the filename, then clear the variable that says this stuff needs to be printed.print
-- ...and print the line. Note that$0
is implicit here, and since this is still in the{ }
from step 2, it only happens if the line matched.foo*.txt
-- just pass all the matching filenames toawk
as arguments, and let it scan over all of them in a big batch.
CodePudding user response:
Try rq
(https://github.com/fuyuncat/rquery/releases)
Below command is easy to understand, it will search all files and return any column equal to 'ClNonZ' and 2nd column equal to 'REAL'.
[ rquery]$ ./rq -q "s @raw | f anycol(1,%,$)='ClNonZ' and @2='REAL'" samples/foo*
B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ
F REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ