Home > Net >  Grab 2 words from end of line
Grab 2 words from end of line

Time:06-08

I have a file list of subdomain and i want to only grab the domain name which is 2 word from the end of line separated with . delimiter. Illustration:

blog.clova.line.me
to
line.me

From the illustration, I performed below command to strip subdomain

cut -d "." -f "2-3"

Above command only works for 3 deep subdomain and would result in: only getting the middle string if it's more than 3 deep-length.

clova.line.me -> line.me
blog.clova.line.me -> clova.line

So I had to use shell scripting which look like this:

while read -r line; do for $(tr -cd "." << ${line} | wc -c) in i; if $i == 2; do cut -d "." -f "2-3"; else if $i == 3; do cut -d "." - f "3-4"
  
and so-on

I though this is not efficient as it's checking each time if a line has numbers of . character. I tried to use regex:

([a-z]*)\.([a-z]*)$

The above regex doesn't work, especially if the domain has special-char in between.

mobile.amazon-aws.com --> aws.com
branch.line-dev.net --> dev.net
git.line-apps-dev.net --> dev.net
yugioh.line-games-de-dev.net > dev.net

As summary:

  • Subdomain can be 3 deep length
  • Domain can be separated with 1 : - character
  • target : Grab domain from subdomain list with rules above

Any help would be cherished.

CodePudding user response:

Simply include the character - in your regex ([a-z-]*)\.([a-z]*)$

Demo: https://regex101.com/r/Yn9Aqm/1


Alternative: You might want to tokenize the string by dots instead. Then picking the last 2 items of the array. That is easier than using a regex.

  • Related