Home > Net >  How grab urls from text file which have .com .org .net domain with unix command
How grab urls from text file which have .com .org .net domain with unix command

Time:09-21

I need a regex for copying domain names from text files. In text files, the domains look like

site.com   - org name - title
site.net   - other name - another title
HTTP://target.ca - ca site - ca title

From this text file I need

site.com
site.net
target.ca

I try sed 's/\.com\/.*/.com/' file.txt but this command only give me .com domain but I need all the domain name. Pls, help me out.

Thank you.

CodePudding user response:

1st solution: With your shown samples, please try following awk code. Simple explanation would be, setting field separator as space OR / for all the lines and in main block of awk program checking if line starts with HTTP: then print 3rd field else print 1st field to get required values as per requirements.

awk -F' |/' '/^HTTP:/{print $3;next} {print $1}' Input_file


2nd solution: Using sed please try following code. Using -E option of -E to enable ERE(extended regular expressions) and capturing group capability of sed here. Here is the Online demo for used regex in sed code.

sed -E 's/^(HTTP:\/\/)?([^[:space:]] ).*$/\2/'  Input_file


3rd solution: Using GNU grep here with its \K option which allows us to match things with regex and forget/neglect them while printing. Here is the Online demo for used regex in grep solution.

grep -oP '^(HTTP:\/\/)?\K([^[:space:]] )'  Input_file
  • Related