I have a file list
of subdomain and i want to only grab the domain name which is 2 word from the end of line separated with .
delimiter. Illustration:
blog.clova.line.me
to
line.me
From the illustration, I performed below command to strip subdomain
cut -d "." -f "2-3"
Above command only works for 3 deep subdomain and would result in: only getting the middle string if it's more than 3 deep-length.
clova.line.me -> line.me
blog.clova.line.me -> clova.line
So I had to use shell scripting which look like this:
while read -r line; do for $(tr -cd "." << ${line} | wc -c) in i; if $i == 2; do cut -d "." -f "2-3"; else if $i == 3; do cut -d "." - f "3-4"
and so-on
I though this is not efficient as it's checking each time if a line has numbers of .
character. I tried to use regex:
([a-z]*)\.([a-z]*)$
The above regex doesn't work, especially if the domain has special-char in between.
mobile.amazon-aws.com --> aws.com
branch.line-dev.net --> dev.net
git.line-apps-dev.net --> dev.net
yugioh.line-games-de-dev.net > dev.net
As summary:
- Subdomain can be 3 deep length
- Domain can be separated with 1 :
-
character target
: Grab domain from subdomain list with rules above
Any help would be cherished.
CodePudding user response:
Simply include the character -
in your regex ([a-z-]*)\.([a-z]*)$
Demo: https://regex101.com/r/Yn9Aqm/1
Alternative: You might want to tokenize the string by dots instead. Then picking the last 2 items of the array. That is easier than using a regex.