Some ways to improve performance of this code-CodePudding

so, I have a for loop that searches for *.log files in a directory and then tries to look for a "pattern" in each log file. If the "pattern" is found, do something with the log files.

There are 10k log files in the directory and the size varies(some are ~1GB, some are some MB's) and my script takes 1 hour to run. What are some ways that I can increase my code performance.

Some that I could think of, get rid of duplicate log files.

Thanks in advance.

CodePudding user response：

Log files are often nice, being appended to. And log lines probably contain time stamps.

Unfortunately that requires programming exceeding simple shell scripts.

Use `tail`

Having a tool running in the background and doing a tail for some 100 lines already should be able to detect duplicate logging. I am afraid that something like perl is needed to glue the logic.

CodePudding user response：

"a loop, looking for log-files, and tries to find something inside"?

What's wrong with a simple grep?

You can find all logfiles with a pattern, using this command:

grep -r -l "pattern" *.log

-r makes sure you look inside subdirectories
-l only shows the filenames, not the actual line with the pattern (obviously, you first need to do the search without the -l in order to check your pattern is found correctly).

Another option is the find ./ -name "*.log" -exec grep -l "pattern" {}, you can add an extra pipe for performing some action.

Use tail

Use `tail`