Home > front end >  Parse Text with "awk" and Modify One Of The Columns With "sed"
Parse Text with "awk" and Modify One Of The Columns With "sed"

Time:05-14

I have a data seperated with pipe "|" and I would like to parse it with awk and write it into a DB.

EndpointRequest|ID-ip-172-31-70-119-eu-west-1-compute-internal-209879772|2022-05-12 08:20:03:467|0|ip-172-31-70-119|616e50193233020648|vfgh|GenericAmount|61d458303574b21f|Display|v1|Display-v1|PrepaidEndpoint|6227300ec1786d26|Corporate|62273041c8cf901071786d81|Health Line||||69.28.67.153|Java/1.8.0_321|application/xml|468|475|POST||http://127.0.0.1/endpoint/||200||2022-05-12 08:20:03:458|0|468|7|0|0|0|true|Http|null|null|HTTPConnector:CallPrepaid|Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\nAuthorization: Bearer e3edbb1d8f5d8c828dc584ed293602bf\nContent-Type: application/xml\nX-Amzn-Trace-Id: Root=1-627cc333-7167\nX-Forwarded-For: XX.XX.XX.XX\nX-Forwarded-Port: 443\nX-Forwarded-Proto: https\n\n<?xml version="1.0"?>\n<!DOCTYPE cp_request SYSTEM "cp_req_websvr.dtd">\n<cp_request>\n    <cp_id>YY1880</cp_id>\n    <cp_transaction_id>SDP</cp_transaction_id>\n    <op_transaction_id>arr684754251</op_transaction_id>\n    <application>1</application>\n    <action>2</action>\n    <user_id type="MSISDN">9999999999</user_id>\n    <cp_timer>5</cp_timer>\n    <transaction_price>1900</transaction_price>\n    <transaction_currency>0</transaction_currency>\n</cp_request>

The data has many lines like the one above and I use the command below to get certain fields.

more file.log | egrep "EndpointRequest|EndpointSuccess|EndpointFailure" | egrep "PrepaidEndpoint" | awk -F"|" '{print $1"|"$2"|"$3"|"$4"|"$5"|"$12"|"$13"|"$15"|"$17"|"$21"|"$25"|"$30"|"$31"|"$32"|"$33"|"$44}'

The thing here is, on the last field (#44), there is an HTTP response that contains some headers and an XML payload. I need to get "op_transaction_id" value ("arr684754251") and add it to the end of the awk command, but am unable to do so. In a seperate command, I can get that value via "sed",

sed -n "s/.*<op_transaction_id>\(.*\)<\/op_transaction_id>.*/\1/p" file.log

How do I migrate the "sed" command into the "awk" command, so I can have "op_transaction_id" value as one of the fields in "awk".

Expected output:

EndpointRequest|ID-ip-172-31-70-119-eu-west-1-compute-internal-209879772|2022-05-12 08:20:03:467|0|ip-172-31-70-119|Display-v1|PrepaidEndpoint|Corporate|Health Line|69.28.67.153|475|200||2022-05-12 08:20:03:458|0|arr684754251

Thank you bash gurus. Any help is appreciated.

CodePudding user response:

How do I migrate the "sed" command into the "awk" command

You might harness gensub function, consider following simple example, let file.txt be |-sheared with 3 columns:

<tag>text1</tag>|A|1
<tag>text2</tag>|B|2
<tag>text3</tag>|C|3

and say you want to get what is inside tag from 1st field and use , then you might do

awk 'BEGIN{FS="|";OFS=","}{$1=gensub(/<tag>(. )<\/tag>/,"\\1",1,$1);print}' file.txt

which gives output

text1,A,1
text2,B,2
text3,C,3

Arguments to gensub are regular expression, replacement, how (either number to point which occurence to replace or "g" for all) and target. gensub does return altered string, which we then assign as new value for 1st field. FS inform that field separator is | and OFS that output field separator is ,. Note that you must not mindlessly copy your regular expression from sed to become 1st argument of gensub. For example ( and ) are used in GNU sed to denote literal brackets and needs to be escaped to get capturing group, in GNU AWK ( and ) denote capturing group and must be escaped to get literal brackets.

(tested in gawk 4.2.1)

  • Related