Home > Enterprise >  What is the fastest way to read/filter a text file
What is the fastest way to read/filter a text file

Time:05-17

I'm trying to loop through a log text file, containing SSH logins and other logs.

The program is returning the total number of SSH logins.

My solution does work but seems a bit slow (~3.5 sec on a 200mo file). I would like to know if there are any ways to make it faster. I'm not really familiar with good practices on Java.

I'm using the BufferedReader class. Maybe there are better classes/methods but everything else I found online was slower.

{
            BufferedReader br;
            if(fileLocation != null) {
                br = new BufferedReader(new FileReader(fileLocation));
            }
            else {
                br = new BufferedReader((new InputStreamReader(System.in, "UTF-8")));
            }
            String line;
            Stack<String> users = new Stack<>();
            int succeeded = 0;
            int failed;
            int total = 0;

            if(!br.ready()) {
                help("Cannot read the file", true);
            }
            while((line=br.readLine())!=null)
            {
                if(!line.contains("sshd")) continue;
                String[] arr = line.split("\\s ");
                if(arr.length < 11) continue;


                String log = arr[4];
                String log2 = arr[5];
                String log3 = arr[8];
                String user = arr[10];
                if(!log.contains("sshd")) continue;
                if(!log2.contains("Accepted")) {
                    if(log3.contains("failure")) {
                        total  ;
                    }
                    continue;
                }
                total  ;
                succeeded  ;

                if(!repeat) {
                    if (users.contains(user)) continue;
                    users.add(user);
                }

                System.out.println((total   1)   " "   user);
            }

Full code : enter image description here

This is a known problem in the standard Java library: Java split String performances.

So in order to speed up your code, you need to speed up this part of the code in some way. The first thing I can suggest is to replace the code on lines 75-79 with this:

Pattern pattern = Pattern.compile("\\s ");
while ((line = br.readLine()) != null) {
    if (!line.contains("sshd")) continue;
    String[] arr = pattern.split(line);
    if (arr.length < 11) continue;
...
}

This may speed up the code a bit, but you can see from the profile that a lot of time is still spent in Pattern and Matcher methods. We need to get rid of Pattern and Matcher for a significant speedup.

For single-character patterns split works without using Regex and does it quite efficiently, let's try replacing the code with:

while ((line = br.readLine()) != null) {
    if (!line.contains("sshd")) continue;
    String[] arr = Arrays.stream(line.split(" "))
                    .filter(s -> !s.isEmpty())
                    .toArray(String[]::new);
    if (arr.length < 11) continue;
...
}

This code runs almost twice as fast on the same data.

  •  Tags:  
  • java
  • Related