I need a regex for copying domain names from text files. In text files, the domains look like
site.com - org name - title
site.net - other name - another title
HTTP://target.ca - ca site - ca title
From this text file I need
site.com
site.net
target.ca
I try sed 's/\.com\/.*/.com/' file.txt
but this command only give me .com domain but I need all the domain name. Pls, help me out.
Thank you.
CodePudding user response:
1st solution: With your shown samples, please try following awk
code. Simple explanation would be, setting field separator as space OR /
for all the lines and in main block of awk
program checking if line starts with HTTP:
then print 3rd field else print 1st field to get required values as per requirements.
awk -F' |/' '/^HTTP:/{print $3;next} {print $1}' Input_file
2nd solution: Using sed
please try following code. Using -E
option of -E
to enable ERE(extended regular expressions) and capturing group capability of sed
here. Here is the Online demo for used regex in sed
code.
sed -E 's/^(HTTP:\/\/)?([^[:space:]] ).*$/\2/' Input_file
3rd solution: Using GNU grep
here with its \K
option which allows us to match things with regex and forget/neglect them while printing. Here is the Online demo for used regex in grep
solution.
grep -oP '^(HTTP:\/\/)?\K([^[:space:]] )' Input_file