Home > Enterprise >  Some ways to improve performance of this code
Some ways to improve performance of this code

Time:09-03

so, I have a for loop that searches for *.log files in a directory and then tries to look for a "pattern" in each log file. If the "pattern" is found, do something with the log files.

There are 10k log files in the directory and the size varies(some are ~1GB, some are some MB's) and my script takes 1 hour to run. What are some ways that I can increase my code performance.

Some that I could think of, get rid of duplicate log files.

Thanks in advance.

CodePudding user response:

Log files are often nice, being appended to. And log lines probably contain time stamps.

Unfortunately that requires programming exceeding simple shell scripts.

Use tail

Having a tool running in the background and doing a tail for some 100 lines already should be able to detect duplicate logging. I am afraid that something like perl is needed to glue the logic.

CodePudding user response:

"a loop, looking for log-files, and tries to find something inside"?

What's wrong with a simple grep?

You can find all logfiles with a pattern, using this command:

grep -r -l "pattern" *.log
  • -r makes sure you look inside subdirectories
  • -l only shows the filenames, not the actual line with the pattern (obviously, you first need to do the search without the -l in order to check your pattern is found correctly).

Another option is the find ./ -name "*.log" -exec grep -l "pattern" {}, you can add an extra pipe for performing some action.

  • Related