I have the lines of the following format
9/14/2021 6:01:14 PM 42 (3224) Receive rate: 39338 B/s
9/14/2021 6:01:29 PM 92 (940) Receive rate: 215363 B/s
I need to extract 2 pieces of data from here: timestamp and the actual rate, e.g.
9/14/2021 6:01:14 PM, 39338
9/14/2021 6:01:29 PM, 215363
I am using grouping and came up with the following pattern:
^(.*)\s*[0-9]*\s \([0-9] \)\s Receive\s rate:\s ([0-9] )
With such a pattern, I successfully return my second group (39338, 215363), but for the first group it goes too far beyond the AM/PM point and the first group becomes 9/14/2021 6:01:14 PM 42
.
If I change the pattern to
^(.*) [0-9]*\s \([0-9] \)\s Receive\s rate:\s ([0-9] ) -> 3 spaces instead of the first \s*
it matches as expected, but there is no guarantee there will be 3 spaces, so I need to use whitespace char with zero or more.
CodePudding user response:
Used regex:
"^(\\d{1,2}/\\d{1,2}/\\d{4}\\s.*?)\\s{3}. ?\\s{3}Receive\\s rate:\\s (\\d )"
Regex in context and testbench:
public static void main(String[] args) {
String input = "9/14/2021 6:01:14 PM 42 (3224) Receive rate: 39338 B/s\n"
"9/14/2021 6:01:29 PM 92 (940) Receive rate: 215363 B/s";
String regex = "^(\\d{1,2}/\\d{1,2}/\\d{4}\\s.*?)\\s{3}. ?\\s{3}Receive\\s rate:\\s (\\d )";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.printf("Time stamp: '%s' || Rate: '%s'%n", matcher.group(1), matcher.group(2));
}
}
Output:
Time stamp: '9/14/2021 6:01:14 PM' || Rate: '39338'
Time stamp: '9/14/2021 6:01:29 PM' || Rate: '215363'
More regular-expression constructs can be found here:
https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html
CodePudding user response:
Retrieve the Date-Time and Receive-Rate parts separately.
I suggest you retrieve the Date-Time and Receive-Rate parts separately. For retrieving the Date-Time part, you can use the rich java.time
API and then you can use the Java RegEx API to retrieve the Receive-Rate part.
Retrieve the Date-Time part
You can use DateTimeFormatter#parse(CharSequence, ParsePosition)
to parse the string to a TemporalAccessor
from which LocalDateTime
can be retrieved.
Learn more about the modern Date-Time API* from Trail: Date Time.
Retrieve the Receive-Rate part
You can use the regex, (?<=(?:Receive rate: ))\d (?=(?: B\/s))
where (?<=(?:Receive rate: ))
and (?=(?: B\/s))
have been used as the positive lookbehind and positive lookahead patterns respectively.
Complete Demo:
import java.text.ParsePosition;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
Stream.of(
"9/14/2021 6:01:14 PM 42 (3224) Receive rate: 39338 B/s",
"9/14/2021 6:01:29 PM 92 (940) Receive rate: 215363 B/s"
)
.forEach(s -> System.out.printf(
"Timestamp: %s, Receive rate: %s%n",
getTimestampPart(s),
getReceiveRate(s)
));
}
static String getTimestampPart(String str) {
DateTimeFormatter dtf = DateTimeFormatter.ofPattern("M/d/uuuu h:mm:ss a", Locale.ENGLISH);
return LocalDateTime.from(dtf.parse(str, new ParsePosition(0))).format(dtf);
}
static String getReceiveRate(String str) {
Matcher matcher = Pattern.compile("(?<=(?:Receive rate: ))\\d (?=(?: B\\/s))").matcher(str);
return matcher.find() ? matcher.group() : "";
}
}
Output:
Timestamp: 9/14/2021 6:01:14 PM, Receive rate: 39338
Timestamp: 9/14/2021 6:01:29 PM, Receive rate: 215363
* For any reason, if you have to stick to Java 6 or Java 7, you can use ThreeTen-Backport which backports most of the java.time functionality to Java 6 & 7. If you are working for an Android project and your Android API level is still not compliant with Java-8, check Java 8 APIs available through desugaring and How to use ThreeTenABP in Android Project.
CodePudding user response:
You can be in this case more specific:
(\d{1,2}/\d{1,2}/\d{1,4}\s \d{1,2}:\d{1,2}:\d{1,2}\s PM)\s \d \s \(\d*?\)\s Receive\s rate\:\s (\d*)\s B/s\s*\n
Where:
\d{1,2}/\d{1,2}/\d{1,4}\s
is the first date pattern with\d{1,2}
being "at least one but at maximum 2 digits" - followed by\d{1,2}:\d{1,2}:\d{1,2}\s
which is the hh:mm:ss blockPM\s
the PM\d \s \(\d*?\)
the digit blocksReceive\s rate\:\s (\d*)
the "Receive rate" with the following number blocksB/s\s*\n
the end
Or lazier:
(^[0-9/]*\s [0-9:] \s PM)\s [0-9 \(\)] ?\s Receive\s rate:\s ([0-9] )\s B/s\n
With:
^[0-9/]*\s
group of numbers or/
followed by whitespace, then[0-9:] \s
group of numbers or:
followed by whitespace[0-9 \(\)] ?\s
group of numbers, empty space or(
or)
- non-greedy followed by whitespaceReceive\s rate:\s
literally what should stand there with whitespaces([0-9] )\s B/s\n
the number block and the rest