find /folder/202205??/ -type f | xargs head -50| grep '^Starting'
There are 20220501 20220502 20220503 and so on folders... This command searches all first 50 lines of all files in '/folder/202205??/' and shows the lines beginning with text "Starting"
I haven't the path and the filename of the files that are matched by the grep command. How can I get this information: path and filename and the matched line with a simple command?
CodePudding user response:
The main problem here is that head
doesn't pass on the info about what lines came from which file, so grep
can pick out the matching lines but not show the file name or path. awk
can do the matching and trimming to 50 lines, and you can control exactly what gets printed for each match. So something like this:
find /folder/202205??/ -type f -exec awk '/^Starting/ {print FILENAME ": " $0}; (FNR>=50) {nextfile}' {}
Explanation: the first clause in the awk
script prints matching lines (prefixed by the FILENAME
, which'll actually include the path as well), and the second skips to the next file when it gets to line 50. Also, I used find
's -exec ...
feature instead of xargs
, just because it's a bit cleaner (and won't run into trouble with weird filenames). Terminating the -exec
command with
instead of \;
makes it run the files in batches (like xargs
) rather than one at a time.
CodePudding user response:
A relatively portable awk
-based solution that provides for
built-in
realpath
variant detection,shell-safe
single-quotation
(and escaping) for filenames, andgrep
-like output format :file-full-realpath
:line-number
:[matched line contents..]
————————————————————————————————————————
gfind 202…………/ -mindepth 1
-type f
-not -empty
-not -name ".*" -print0 |
xargs -0 -n 20 -P 16 dash -c 'nice [mg]awk -e '\''
# gawk profile, created Fri May 6 23:26:31 2022
# BEGIN rule(s)
BEGIN {
1 __=substr("grealpath", 2^0^system("exit \140 which "\
"grealpath | grep -m 1 -ce . \140 "))
1 FS="^Starting"
}
# Rule(s)
1020 50 < FNR { # 20
20 nextfile
}
1000 FNR == 1 { # 20
20 _ = getpath(FILENAME, __)
}
1000 -NF < -sub("^",(_)":"(FNR)":",$0) {
print
}
20 function getpath(_,____,__,___)
{
20 return "-"==_ \
? "/dev/stdin" \
: substr((___=RS)*(RS="\0")*gsub(/\47/,"\47\134&\47",_),
\
((__=(____)" -zePq \47"(_)"\47 ")|getline _)~"",
__*close(__)^(RS=___))(_)
}'\'' "${@}" ' _
CodePudding user response:
I am sure this is not perfect. But it might give some new ideas.
Be aware, that filenames with special characters like newlines are not handled correctly in this solution !!
while IFS=: read -r -a a; do [[ ${a[1]} -gt 50 ]] && break; printf "%s\n" "${a[0]}"; done < <( grep -rnH '^Starting' /folder/202205??/ | sort -t":" -k2,2n )
This bash
snippet is written in one line, but actually with pretty printing it is more than one.
while IFS=: read -r -a a; do
[[ ${a[1]} -gt 50 ]] && break
printf "%s\n" "${a[0]}"
done < <( grep -rnH '^Starting' /folder/202205??/ | sort -t":" -k2,2n )
grep
can go recursive through directories using -r
and shows the line number -n
and the filename -H
. The sort
is done on the line number. The loop stops on line number greater 50. Till then it prints the filename.
Depending on what you want, you can output the line number and/or the string found.
If you need the information inside something else, where the line number can be handled, the simple grep
might lead you to a better solution:
grep -rnH '^Starting' /folder/202205??/
I am sure the output can be put to something like awk
which stops the output if the number in the second field is greater than 50. Unfortunately I am no awk expert.