Home > Software engineering >  How can I match only integers in Perl?
How can I match only integers in Perl?

Time:02-04

So I have an array that goes like this:

my @nums = (1,2,12,24,48,120,360);

I want to check if there is an element that is not an integer inside that array without using loop. It goes like this:

if(grep(!/[^0-9]|\^$/,@nums)){
    die "Numbers are not in correct format.";
}else{
    #Do something
}

Basically, the format should not be like this (Empty string is acceptable):

1A

A2

@A

@

#######

More examples:

1,2,3,A3 = Unacceptable

1,2,###,2 = unacceptable

1,2,3A,4 = Unacceptable

1, ,3,4=Acceptable

1,2,3,360 = acceptable

I know that there is another way by using look like a number. But I can't use that for some reason (outside of my control/setup reasons). That's why I used the regex method.

My question is, even though the numbers are in not correct format (A60 for example), the condition always return False. Basically, it ignores the incorrect format.

CodePudding user response:

You say in the comments that you don't want to use modules because you can't install them, but there are many core modules that should come with Perl (although some systems screw this up).

zdim's answer in the comments is to look for anything that is not 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. That's the negated character class [^0-9]. A grep in scalar context returns the number of items that match:

my $found_non_ints = grep { /[^0-9]/ } @items;

Instead of that, I'd go back to the non-negated character class and match string that only has zero or more digits. To do this, anchor the pattern to the absolute start and end of the string:

my $found_non_ints = grep { ! /\A[0-9]*\z/  } @items;

But, this doesn't really match integers. It matches positive whole numbers (and zero). If you want to match negative numbers as well, allow an optional - at the start of the string:

my $found_non_ints = grep { ! /\A-?[0-9]*\z/  } @items;

That - would be a problem in the negated character class.

Also, you don't want the $ anchor here: that allows a possible newline to match at the end, and that's a non-digit (the \Z is the same for the end of the string). Also, the meaning of $ can change based on the setting of the /m flag, which might be set with default regex flags.

Here's a short program with your sample data. Note that you need to decide how to split up the list; does whitespace matter? I decided to remove whitespace around the comma:

#!perl
use v5.10;

while( <DATA> ) {
    chomp;
    my $found_non_ints = grep { ! /\A[0-9]*\z/  } split /\s*,\s*/;
    say "$_ => Found $found_non_ints non-ints";
    }

__DATA__
1A
A2
@A
@
1,2,3,A3
1,2,###,2
1,2,3A,4
1, ,3,4
1,,3,4
1,2,3,360

CodePudding user response:

The solution proposed in the question gets close, except that the logic got reversed and there is an error in a regex pattern. One way for it:

if ( grep { /[^0-9] | ^$/x } @nums ) { say 'not all integers' }

Regex explanation

  • [] is a character class: it matches any one of the characters listed inside (so [abc] matches either of a, b, or c) -- but when it starts with a ^ it matches any character not listed; so [^abc] matches any char not being either of a, b, or c. The pattern 0-9 inside a character class specifies all digits in that range (and we can also use a-z and A-Z)

    So [^0-9] matches any character that is not a digit

  • Then that is or-ed by | with a ^$: ^ matches beginning of the string and $ is for the end of it. So ^$ match a string without anything -- an empty string! We need to account for that as [^0-9] doesn't while an array element can be an empty string. (It can also be a undef but from my understanding that is not possible with actual data, and a regex on undef would draw a warning.)

    Note that $ allows for a newline as well, and that ^ and $ may change their meaning if /m modifier is in use, matching on linefeeds inside a string. However, in all these cases we'd be matching a non-digit, which is precisely the point here

  • /x modifier makes it disregard literal spaces inside so we can space things out for easier reading. (It also allows for newlines and comments with #, so complex patterns can be organized and documented very nicely)

So that's all -- the regex tries to match anything that shouldn't be in an integer (assumed to be strictly positive in OP's data).

If it matches any such, in any one of the array elements, then grep returns a list which isn't empty (but has at least one element) and that is "true" under if. So we caught a non-integer and we go into if's block to deal with that.

A little aside: we can also declare and populate an array right inside the if condition, to catch all those non-integers:

if ( my @non_ints = grep { /[^0-9] | ^$/x } @nums ) { 
    say 'Non-integers: ', join ' ', map { "|$_|" } @non_ints;
}

This also reads more nicely, telling by the array name what we're after in that complicated condition: "non_ints." I put || around each item in print to be able to see an empty string.

Now, when you put an exclamation mark in front of that regex, it reverses the true/false return from the regex and our code goes haywire. So drop that !.

The other error is in escaping the ^ by having \^. This would match a literal ^ character, robbing ^ of its special meaning as a pattern in regex, explained above. So drop that \.


One other way is in using an extremely useful List::Util library, which is "core" (so it is normally installed with Perl, even though that can get messed up).

Among a number of essential functions it gives us any, and with it we have

use List::Util qw(any);

if ( any { /[^0-9]|^$/ } @nums ) { say 'not all integers' }

I like any firstly because the name of the function includes at least a part of the needed logic, making code that much clearer and easier to comprehend: is there any element of @nums for which the code in the block is true? So any element which contains a non-digit? Precisely what is needed here.

Then, another advantage is that any will quit as soon as it finds one match, while grep continues through the whole list. But this efficiency advantage shows only on very large arrays or a lot of repeated checks. Also, on the other hand sometimes we want to count all instances.

I'd also like to point out some of any's siblings: none and notall. These names themselves also capture a good deal of logic, making otherwise possibly convoluted code that much clearer. Browse through this library to get accustomed to what is in there.


A program with your test data

use warnings;
use strict;
use feature 'say';

while (<DATA>) {
    chomp;
    my @nums = split /\s*,\s*/;
    say "Data: @nums";
    
    if ( my @non_ints = grep { /[^0-9] | ^$/x } @nums ) { 
        say 'Non-ints: ', join ' ', map { "|$_|" } @non_ints;
    }
    say '---';
}
    
__DATA__
1A
A2
@A
@
1,2,3,A3
1,2,###,2
1,2,3A,4
1, ,3,4
1,2,3,360
  • Related