Extracting certain words from a text file using tr and awk-CodePudding

this has been giving me a lot of trouble

URL: http://123.123.123.123
file: php
124.124.124.124|user1|email|phone

URL: http://1.2.3.4
file: php
2.1.3.1|userx|emailx|phonex

and the file contains more sets of data just like this one

i used

grep http -A 3|tr '\n' ' '|tr '|' ' '|awk '{print $2,$7,$8}'|tr ' ' ':'

the outcome is only from the first set of data

123.123.123.123:email:phone

intended outcome

123.123.123.123:email:phone
1.2.3.4:emailx:phonex

CodePudding user response：

If you are using Awk anyway, you can get rid of grep and tr.

If you can rely on the empty line to separate arguments, try RS='\n\n'. Here's a refactoring which instead extracts the information from the third line after the hit.

awk '/http/ { l=2; ip=$0; sub(/.*\/\//, "", ip); next }
l && --l == 0 { tail=$0; sub(/^[^|]*[|][^|]*[|]/, "", tail);
    sub(/[|]/, ":", tail); print ip ":" tail }'

Perhaps /^URL:/ would be a better regex than /http/ for finding the beginning of a record.

CodePudding user response：

gawk 'gsub("[|]", ":", $!(NF = NF))' RS= OFS= FS='. //|\n[^|]*[|][^|]*'

123.123.123.123:email:phone
1.2.3.4:emailx:phonex

CodePudding user response：

I'd do it like that:

awk -F\| '
    /^URL:/ { sub(/.*\/\//,""); url=$0; next   }
      NF==4 { printf "%s:%s:%s\n", url, $3, $4 }
' file

CodePudding user response：

If ed is available/acceptable.

The script.ed

g/^$/d
g|^URL: http://|s|||\
 d
%s/^.*user[^|]*//
g/./; j
%s/|/:/g
,p
Q

Run

ed -s file.txt < script.ed

CodePudding user response：

I would exploit getline function for this task as follows, let file.txt content be

URL: http://123.123.123.123
file: php
124.124.124.124|user1|email|phone

URL: http://1.2.3.4
file: php
2.1.3.1|userx|emailx|phonex

then

awk 'BEGIN{FS="|";OFS=":"}sub(/^URL: /,""){url=$0;getline;getline;print url,$3,$4}' file.txt

gives output

http://123.123.123.123:email:phone
http://1.2.3.4:emailx:phonex

Explanation: I inform GNU AWK that field separator (FS) is pipe (|) whilst output field separator (OFS) is colon (:), I use two effects of sub: alteration of line and return value, if alteration occurred I save current line (with leading URL: removed by sub) I do use getline twice to get line after next line, after that I print url, 3rd and 4th columns.

(tested in GNU Awk 5.0.1)