I am wondering if there is a way to search for all the files from a certain directory including subdirectories using a find command on AIX 6.x, before calling an external command (e.g. hlcat) to display/convert them into a readable format, which can then be piped through a grep command to find a pattern instead of using loops in the shell?
e.g. find . -type f -name “*.hl7” -exec hlcat {} | grep -l “pattern” \;
The above command would not work and I have to use a while loop to display the content and search for the pattern as follows:
find . -type f -name “*.hl7” -print | while read file; do
hlcat $file | grep -l “pattern”;
done
At the same time, these HL7 files have been renamed with round brackets which prevent them from being open without having to include double quotes around the file name.
e.g. hlcat (patient) filename.hl7 will fail to open.
hlcat “(patient) filename.hl7” will work.
In short, I am looking for a clean concise one-liner approach within the find command and view and search their content these HL7 files with round bracket names.
Many thanks, George
P.S. HL7 raw data is made up of one continuous line and is not readable unless it is converted into a workable reading format using tools such as hlcat. in
CodePudding user response:
1. Finding the HL7 files that contain pattern
:
- with bash:
#!/bin/bash
find . -type f -name '*.hl7' -exec printf '%s\0' {} |
while IFS='' read -r -d '' filepath
do
hlcat "$filepath" | egrep -q 'pattern' && printf '%s\n' "$filepath"
done
note: AIX find
doesn't have the -print0
option
- or with a script inside
find
:
find . -type f -name '*.hl7' -exec sh -c '
for f; do
hlcat "$f" | egrep -q pattern && printf "%s\n" "$f";
done
' _ {}
note: pattern
has to be correctly inserted in the script (will probably result in something less readable)
2. Finding pattern
in the HL7 files:
- as @Philippe suggested:
find . -type f -name '*.hl7' -exec hlcat {} \; | grep 'pattern'
- or if
hlcat
supports multiple files as argument:
find . -type f -name '*.hl7' -exec hlcat {} | grep 'pattern'
Update: Replacing hlcat
with cat
to show you how it works
Consider the following files with simple content:
filename:
(Barry) fileX.hl7
content:Barry
filename:
(John) fileY.hl7
content:John
filename:
(Jolene) fileZ.hl7
content:Jolene
The command:
find . -type f -name '*.hl7' -exec cat {} \; | egrep -l 'Barry|Jolene'
Outputs:
(standard input)
That means that egrep
found the pattern... in its standard input! That's all it can do, and your awk
script will do something equivalent.
While the commands:
find . -type f -name '*.hl7' -exec sh -c '
for f; do
cat "$f" | egrep -q "Barry|Jolene" && printf "%s\n" "$f";
done
' _ {}
and
#!/bin/bash
find . -type f -name '*.hl7' -exec printf '%s\0' {} |
while IFS='' read -r -d '' filepath
do
cat "$filepath" | egrep -q 'Barry|Jolene' && printf '%s\n' "$filepath"
done
both output:
./(Jolene) fileZ.hl7
./(Barry) fileX.hl7
Which are the files that contain the pattern.
Now if you're fine without a robust handling of filenames, then your example with the while
loop could work, provided that you add double quotes when expanding the filename:
find . -type f -name '*.hl7' -print | while read file; do
cat "$file" | egrep -q 'Barry|Jolene' && printf '%s\n' "$file"
done
outputs:
./(Jolene) fileZ.hl7
./(Barry) fileX.hl7
But what's the point of not creating bullet-proof code when you can?
CodePudding user response:
A solution inspired of your awk example:
find . -type f -name '*.hl7' -exec awk -v regexp='^(Barry|Jolene)$' -F '|' '
FNR == 1 { if(found) print filename; found = 0; filename = FILENAME }
$1 == "PID" && $5 ~ regexp { found = 1 }
END { if (found) print filename }
' {}