so, I have a for loop that searches for *.log files in a directory and then tries to look for a "pattern" in each log file. If the "pattern" is found, do something with the log files.
There are 10k log files in the directory and the size varies(some are ~1GB, some are some MB's) and my script takes 1 hour to run. What are some ways that I can increase my code performance.
Some that I could think of, get rid of duplicate log files.
Thanks in advance.
CodePudding user response:
Log files are often nice, being appended to. And log lines probably contain time stamps.
Unfortunately that requires programming exceeding simple shell scripts.
Use tail
Having a tool running in the background and doing a tail
for some 100 lines already should be able to detect duplicate logging. I am afraid that something like perl
is needed to glue the logic.
CodePudding user response:
"a loop, looking for log-files, and tries to find something inside"?
What's wrong with a simple grep
?
You can find all logfiles with a pattern, using this command:
grep -r -l "pattern" *.log
-r
makes sure you look inside subdirectories-l
only shows the filenames, not the actual line with the pattern (obviously, you first need to do the search without the-l
in order to check your pattern is found correctly).
Another option is the find ./ -name "*.log" -exec grep -l "pattern" {}
, you can add an extra pipe for performing some action.