I want read my log file and put into csv file via Java. How would I parse the log file with these delimiters into csv file as below.
.log file:
2022-06-01 11:00:00 wt.nm=aa&wt.ti=t1&
2022-06-02 12:00:00 wt.nm=ab&wt.ti=t2&
2022-06-03 10:00:00 wt.nm=ac&wt.ti=t3&
date and time is separated by space, name and title separated by wt.nm=/wt.ti with & as end
CSV output:
date,time,name,title
2022-06-01,11:00:00,aa,t1
2022-06-02,12:00:00,ab,t2
2022-06-03,10:00:00,ac,t3
import java.io.*;
public class test {
public static void main(String[] args) {
try{
BufferedReader in = new BufferedReader(new FileReader("/Users/ts/Desktop/test/src/0606.log"));
FileWriter wb = new FileWriter("/Users/ts/Desktop/testcsv.csv");
String str;
while((str=in.readLine()) != null) {
System.out.println(str);
wb.write(str);
}
}
catch (IOException e) {
}
}
}
CodePudding user response:
Following will do the transformation.
while ((str = in.readLine()) != null) {
str = str.replaceAll(" wt.nm=|&wt.ti=| ", ",").replace("&", "");
System.out.println(str);
wb.write(str);
}
CodePudding user response:
The format in your data (ignoring the leading date and time) looks like:
wt.nm
is always presentwt.ti
is always presentwt.nm
always appears beforewt.ti
- there are never additional name value pairs (beyond
wt.nm
andwt.ti
)
However, the data has a more general pattern to it:
- one (or more) name value pairs
- for each name value pair, the name is separated from the value by
=
- each name value pair is separated from other pairs by
&
Also, it's easy to imagine encountering additional name value pairs (who says it will always be those same two forever? why not five name values in a single line?), or perhaps the ordering changes (could wt.ti
show up before wt.nm
?).
The code below takes a general approach, working with the general pattern of your input data. I included some sample data to show how it works.
- the first input line –
"2022-06-01 11:00:00 wt.nm=aa&wt.ti=t1&"
– one of your original inputs - the second line –
"2022-06-01 11:00:00 xxxxxxxxxxx=bb&"
– uses a different "name" altogether, and only has a single name value pair (not two):xxxxxxxxxxx=bb
- the third line –
"2022-06-01 11:00:00 xxxxxxxxxxx=cc&yyyyyyyyyyyy=t3&zzzzzzzz=99&"
– includes a third name value pair:zzzzzzzz=99
. who says there will always be just two name value pairs?
String[] lines = {
"2022-06-01 11:00:00 wt.nm=aa&wt.ti=t1&",
"2022-06-01 11:00:00 xxxxxxxxxxx=bb&",
"2022-06-01 11:00:00 xxxxxxxxxxx=cc&yyyyyyyyyyyy=t3&zzzzzzzz=99&"
};
for (String line : lines) {
System.out.println("original: " line);
String edit1 = line.replaceAll("&", " ");
System.out.println(" " edit1);
StringTokenizer tokenizer = new StringTokenizer(edit1);
StringBuilder finalLine = new StringBuilder();
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
System.out.println(" token: " token);
if (token.contains("=")) {
int positionOfEqualsSign = token.indexOf("=");
String value = token.substring(positionOfEqualsSign 1);
finalLine.append(value);
} else {
finalLine.append(token);
}
if (tokenizer.hasMoreTokens()) {
finalLine.append(",");
}
}
System.out.println(" final: " finalLine);
System.out.println();
}
Here's the output from that code, which includes a lot of extra output in order to be easier to follow:
original: 2022-06-01 11:00:00 wt.nm=aa&wt.ti=t1&
2022-06-01 11:00:00 wt.nm=aa wt.ti=t1
token: 2022-06-01
token: 11:00:00
token: wt.nm=aa
token: wt.ti=t1
final: 2022-06-01,11:00:00,aa,t1
original: 2022-06-01 11:00:00 xxxxxxxxxxx=bb&
2022-06-01 11:00:00 xxxxxxxxxxx=bb
token: 2022-06-01
token: 11:00:00
token: xxxxxxxxxxx=bb
final: 2022-06-01,11:00:00,bb
original: 2022-06-01 11:00:00 xxxxxxxxxxx=cc&yyyyyyyyyyyy=t3&zzzzzzzz=99&
2022-06-01 11:00:00 xxxxxxxxxxx=cc yyyyyyyyyyyy=t3 zzzzzzzz=99
token: 2022-06-01
token: 11:00:00
token: xxxxxxxxxxx=cc
token: yyyyyyyyyyyy=t3
token: zzzzzzzz=99
final: 2022-06-01,11:00:00,cc,t3,99