Home > OS >  Regex patter to exclude whitspacs\es
Regex patter to exclude whitspacs\es

Time:09-22

I have the lines of the following format

9/14/2021 6:01:14 PM   42 (3224)   Receive rate: 39338 B/s
9/14/2021 6:01:29 PM   92 (940)   Receive rate: 215363 B/s

I need to extract 2 pieces of data from here: timestamp and the actual rate, e.g.

9/14/2021 6:01:14 PM, 39338
9/14/2021 6:01:29 PM, 215363

I am using grouping and came up with the following pattern:

^(.*)\s*[0-9]*\s \([0-9] \)\s Receive\s rate:\s ([0-9] )

With such a pattern, I successfully return my second group (39338, 215363), but for the first group it goes too far beyond the AM/PM point and the first group becomes 9/14/2021 6:01:14 PM 42.

If I change the pattern to

^(.*)   [0-9]*\s \([0-9] \)\s Receive\s rate:\s ([0-9] ) -> 3 spaces instead of the first \s* 

it matches as expected, but there is no guarantee there will be 3 spaces, so I need to use whitespace char with zero or more.

CodePudding user response:

Used regex:

"^(\\d{1,2}/\\d{1,2}/\\d{4}\\s.*?)\\s{3}. ?\\s{3}Receive\\s rate:\\s (\\d )"

Regex in context and testbench:

public static void main(String[] args) {
    String input = "9/14/2021 6:01:14 PM   42 (3224)   Receive rate: 39338 B/s\n"
              "9/14/2021 6:01:29 PM   92 (940)   Receive rate: 215363 B/s";

    String regex = "^(\\d{1,2}/\\d{1,2}/\\d{4}\\s.*?)\\s{3}. ?\\s{3}Receive\\s rate:\\s (\\d )";
    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(input);
    while(matcher.find()) {
        System.out.printf("Time stamp: '%s' || Rate: '%s'%n", matcher.group(1), matcher.group(2));
    }
}

Output:

Time stamp: '9/14/2021 6:01:14 PM' || Rate: '39338'
Time stamp: '9/14/2021 6:01:29 PM' || Rate: '215363'

More regular-expression constructs can be found here:

https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html

CodePudding user response:

Retrieve the Date-Time and Receive-Rate parts separately.

I suggest you retrieve the Date-Time and Receive-Rate parts separately. For retrieving the Date-Time part, you can use the rich java.time API and then you can use the Java RegEx API to retrieve the Receive-Rate part.

Retrieve the Date-Time part

You can use DateTimeFormatter#parse(CharSequence, ParsePosition) to parse the string to a TemporalAccessor from which LocalDateTime can be retrieved.

Learn more about the modern Date-Time API* from Trail: Date Time.

Retrieve the Receive-Rate part

You can use the regex, (?<=(?:Receive rate: ))\d (?=(?: B\/s)) where (?<=(?:Receive rate: )) and (?=(?: B\/s)) have been used as the positive lookbehind and positive lookahead patterns respectively.

Complete Demo:

import java.text.ParsePosition;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;

public class Main {
    public static void main(String[] args) {
        // Test
        Stream.of(
                "9/14/2021 6:01:14 PM   42 (3224)   Receive rate: 39338 B/s",
                "9/14/2021 6:01:29 PM   92 (940)   Receive rate: 215363 B/s"
        )
        .forEach(s -> System.out.printf(
                            "Timestamp: %s, Receive rate: %s%n", 
                            getTimestampPart(s),
                            getReceiveRate(s)
        ));
    }

    static String getTimestampPart(String str) {
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("M/d/uuuu h:mm:ss a", Locale.ENGLISH);
        return LocalDateTime.from(dtf.parse(str, new ParsePosition(0))).format(dtf);
    }

    static String getReceiveRate(String str) {
        Matcher matcher = Pattern.compile("(?<=(?:Receive rate: ))\\d (?=(?: B\\/s))").matcher(str);
        return matcher.find() ? matcher.group() : "";
    }
}

Output:

Timestamp: 9/14/2021 6:01:14 PM, Receive rate: 39338
Timestamp: 9/14/2021 6:01:29 PM, Receive rate: 215363

ONLINE DEMO


* For any reason, if you have to stick to Java 6 or Java 7, you can use ThreeTen-Backport which backports most of the java.time functionality to Java 6 & 7. If you are working for an Android project and your Android API level is still not compliant with Java-8, check Java 8 APIs available through desugaring and How to use ThreeTenABP in Android Project.

CodePudding user response:

You can be in this case more specific:

(\d{1,2}/\d{1,2}/\d{1,4}\s \d{1,2}:\d{1,2}:\d{1,2}\s PM)\s \d \s \(\d*?\)\s Receive\s rate\:\s (\d*)\s B/s\s*\n

Where:

  • \d{1,2}/\d{1,2}/\d{1,4}\s is the first date pattern with \d{1,2} being "at least one but at maximum 2 digits" - followed by
  • \d{1,2}:\d{1,2}:\d{1,2}\s which is the hh:mm:ss block
  • PM\s the PM
  • \d \s \(\d*?\) the digit blocks
  • Receive\s rate\:\s (\d*) the "Receive rate" with the following number blocks
  • B/s\s*\n the end

Or lazier:

(^[0-9/]*\s [0-9:] \s PM)\s [0-9 \(\)] ?\s Receive\s rate:\s ([0-9] )\s B/s\n

With:

  • ^[0-9/]*\s group of numbers or / followed by whitespace, then
  • [0-9:] \s group of numbers or : followed by whitespace
  • [0-9 \(\)] ?\s group of numbers, empty space or ( or ) - non-greedy followed by whitespace
  • Receive\s rate:\s literally what should stand there with whitespaces
  • ([0-9] )\s B/s\n the number block and the rest
  • Related