I have a log file that has the following structure:
- Record always begins with time data like YYYY-MM-DD-hh-mm-ss
- Then there can be one or multiple lines including empty lines
- New record begins with exactly two new lines (like \n characters) followed by YYYY-MM-DD-hh-mm-ss at the beginning of line
Sample:
2022-05-05-14.06.15.041968 some data
that can spread to one line
2022-05-05-14.06.16.036412 some data
that can spread to
two lines
2022-05-05-14.06.17.234123 some data
that can spread to
two lines
or multiple lines with empty new lines
I would like to get:
2022-05-05-14.06.15.041968 some data that can spread to one line
2022-05-05-14.06.16.036412 some data that can spread to two lines
2022-05-05-14.06.17.234123 some data that can spread to two lines or multiple lines with empty new lines
How to solve this problem using Linux commands like sed, awk, tr and similar?
CodePudding user response:
This should get the job done but with limitation that data after datetime of each record should not include any datetime format string
sed '/^$/d' your_file_name | tr '\n' ' ' | sed -E 's/([0-9]{4}-[0-9]{2}-[0-9]{2}-[0-9]{2}\.[0-9]{2}\.[0-9]{2}\.[0-9]{6})/\n\1/g' | sed 1d
CodePudding user response:
Suggesting gawk
script (standard awk
in most Linux machines):
gawk '{gsub("\n "," ");gsub("\n","",RT);print $0 "\n"RT}' RS="\n[[:digit:]]{4}-[[:digit:]]{2}-" ORS="" input.txt
Results:
2022-05-05-14.06.15.041968 some data that can spread to one line
2022-05-05-14.06.16.036412 some data that can spread to two lines
2022-05-05-14.06.17.234123 some data that can spread to two lines or multiple lines with empty new lines
CodePudding user response:
awk '
{
if ( $1 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2} / ) {
printf "\n%s", $0
} else {
if ($1 != "") printf " %s", $0
}
}
END{
printf "\n"
}' input_file | sed 1d
2022-05-05-14.06.15.041968 some data that can spread to one line
2022-05-05-14.06.16.036412 some data that can spread to two lines
2022-05-05-14.06.17.234123 some data that can spread to two lines or multiple lines with empty new lines