Home > Blockchain >  remove specific string from text file using sed
remove specific string from text file using sed

Time:08-03

I have a file called a.txt that have,

time="2022-08-02T15:07:53 05:30" level=info msg="\x1b[32m\x1b[1mPUBLIC\x1b[39m\x1b[0m http://some.s3-ap-southeast-2.amazonaws.com/ (\x1b[33mhttp://some.com\x1b[39m)"
time="2022-08-02T15:07:53 05:30" level=info msg="\x1b[31m\x1b[1mFORBIDDEN\x1b[39m\x1b[0m http://some.s3.amazonaws.com (\x1b[33mhttp://some.com\x1b[39m)"
time="2022-08-02T15:07:54 05:30" level=info msg="\x1b[31m\x1b[1mFORBIDDEN\x1b[39m\x1b[0m http://some.s3.amazonaws.com (\x1b[33mhttp://some.com\x1b[39m)"
time="2022-08-02T15:07:58 05:30" level=info msg="\x1b[31m\x1b[1mFORBIDDEN\x1b[39m\x1b[0m http://some-assets.s3.amazonaws.com (\x1b[33mhttp://some.com\x1b[39m)"
time="2022-08-02T15:08:01 05:30" level=info msg="\x1b[31m\x1b[1mFORBIDDEN\x1b[39m\x1b[0m http://some.s3.amazonaws.com (\x1b[33mhttp://some.com\x1b[39m)"

I want this output

PUBLIC    http://some.s3-ap-southeast-2.amazonaws.com
FORBIDDEN http://some.s3.amazonaws.com
FORBIDDEN http://some.s3.amazonaws.com
FORBIDDEN http://some-assets.s3.amazonaws.com
FORBIDDEN http://some.s3.amazonaws.com

I tried this

cat a.txt | cut -d "=" -f4- | cut -d "[" -f3- | cut -d "m" -f2- |  awk -F '\\.amazonaws.com' '{print $1".amazonaws.com"}'

This is working but, I'm not able to remove \x1b[39m\x1b[0m

CodePudding user response:

Using sed

$ sed -E 's~([^[]*\[){2}[^A-Z]*([^\]*)[^ ]* ([^ ]*\.[a-z] ).*~\2 \3~' input_file | column -t
PUBLIC     http://some.s3-ap-southeast-2.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com
FORBIDDEN  http://some-assets.s3.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com

CodePudding user response:

You may use this awk solution:

awk -F= '{gsub(/^.*1m|\/? \(.*$|\\x[^[:blank:]]*/, "", $4); print $4}' file | column -t

PUBLIC     http://some.s3-ap-southeast-2.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com
FORBIDDEN  http://some-assets.s3.amazonaws.com
FORBIDDEN  http://some.s3.amazonaws.com

Use column -t for formatting of output.

CodePudding user response:

With your shown samples please try following awk code, written and tested in GNU awk. This is a GNU awk column command's combination. Code is using match function of awk to get the matched sub string as per required output.

awk '
BEGIN{ OFS="\t" }
match($0,/^time=".*level=\S \smsg="[^[]*\[[^[]*\[1m([^\\]*)\\x1b\S \s(https?:\/\/\S )/,arr){
  print arr[1],arr[2]
}
' Input_file | column -t -s $'\t'
  • Related