Home > Blockchain >  Perl Regular Expression match ip and host from /etc/hosts
Perl Regular Expression match ip and host from /etc/hosts

Time:10-05

Searching for a regex to match ip and all hosts for a given ip in /etc/hosts

Example hosts file:

10.10.10.10  test.com test2.com
10.10.10.11  test1.com 
10.10.10.12  test3.com test5.com

Used regular expression:

^(\s )?(?<Address>[0-9.:] )(\s (?<Host>[\w.-] )) $

Expected output:

Address: ["10.10.10.10"]
Host: ["test.com","test2.com"]

Address: ["10.10.10.11"]
Host: ["test1.com"]

Address: ["10.10.10.12"]
Host: ["test3.com","test5.com"]

Example code:

use strict;
use Data::Dumper;

my @str = ( "10.10.10.10  test.com test2.com",
            "10.10.10.11  test1.com",
            "10.10.10.12  test3.com test5.com");

foreach ( @str  )
{
    while ($_ =~ m/^(\s )?(?<Address>[0-9.:] )(\s (?<Host>[\w.-] )) $/img) {
       print Dumper(\% ) ;
    }
}

CodePudding user response:

Since there can not be spaces in any of that, and address always comes first, can simply capture all non-space sequences

my ($address, @hosts) = /(\S )/g;

Then place them in a suitable data structure, for example

use warnings;
use strict;
use feature 'say';
use Data::Dumper;

my @str = ( 
    "10.10.10.10  test.com test2.com", 
    "10.10.10.11  test1.com", 
    "10.10.10.12  test3.com test5.com" );

my %host;

foreach (@str) {
    my ($address, @hosts) = /(\S )/g;
    $host{$address} = \@hosts;
}

say Dumper \%host;

As for the attempt in the question, that regex has a pattern for an address and then for a URL and as it is it matches an address and one host (despite a fair attempt to match multiple hosts).

On the next iteration of that while loop it continues attempting to match from after that first host it matched in the first iteration, and sees no address ahead in the string so it fails. So we get an address and one host. (Why not show the output as well?)

In order for that pattern to match multiple hosts (after an address) one would have to get those quantifiers ( or rather *) just right so to allow failed matches, in a very un-intuitive way.

Also, with the pattern as it stands there is no reason for a while loop there as that pattern is intended to match everything at once.

Another way would be to use \G to parse the string, what would be suitable if your string had items of interest mixed up in an arbitrary order.

While all that is legitimate the way used here is much simpler (really just a split in disguise).

CodePudding user response:

The structure of /etc/hosts predispose usage of split to process the input data.

Although regex also can be used for such task but usage of split makes solution of the problem perhaps much easier.

use strict;
use warnings;

my %hosts;

while( <DATA> ) {
    my($ip,@fqdn) = split;
    $hosts{$ip}=\@fqdn;
}

for my $ip ( keys %hosts ) {
    printf "Address: [%s]\n", $ip;
    printf "Host:    [%s]\n\n", join( ', ', @{$hosts{$ip}});
}

__DATA__
10.10.10.10  test.com test2.com
10.10.10.11  test1.com 
10.10.10.12  test3.com test5.com

Output

Address: [10.10.10.11]
Host:    [test1.com]

Address: [10.10.10.12]
Host:    [test3.com, test5.com]

Address: [10.10.10.10]
Host:    [test.com, test2.com]
  • Related