Home > Back-end >  How to use Perl's Text::Aspell to spellcheck a text?
How to use Perl's Text::Aspell to spellcheck a text?

Time:10-05

I want to add spell checking to my Perl program. Looks like Text::Aspell should do what I need, but it only offers a function to check single words.

use strict;
use warnings;
use Text::Aspell;

my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();
$aspell->set_option('lang', 'en');
print "$input: ", $aspell->check($input), "\n";

This prints:

This doesn't look too bad. Me&you. with/without. 1..2..3..go!: 0

So clearly it does only take single words, then how do I separate a text into words? A simple split at white space:

foreach my $word (split /\s/, $input) {
    next unless($word =~ /\w/);
    print "$word: ", $aspell->check($word), "\n";
}

This gets problems with punctuation marks that don't have white space:

This: 1
doesn't: 1
look: 1
too: 1
bad.: 0
Me&you.: 0
with/without.: 0
1..2..3..go!: 0

I guess I could mention the punctuation characters:

foreach my $word (split qr{[,.;!:\s#"\?&%@\(\)\[\]/\d]}, $input) {
    next unless($word =~ /\w/);
    print "$word: ", $aspell->check($word), "\n";
}

This gets reasonable output:

This: 1
doesn't: 1
look: 1
too: 1
bad: 1
Me: 1
you: 1
with: 1
without: 1
go: 1

but seems clumsy and I'm wondering if there is an easier (less code for me to write, less brittle) way.

How do I spell check a text?

CodePudding user response:

Text::Aspell has no options to check a whole string, and instead only checks single words. Instead of splitting the string by yourself, I would suggest to use a module that already does that for you, such as Text::SpellChecker. For instance:

use strict;
use warnings;
use Text::SpellChecker;
use feature 'say';

my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });

while (my $word = $checker->next_word) {
    say "Invalid word: $word";
}

Or,

my $checker = Text::SpellChecker->new(text => $input);
$checker->set_options(aspell => { 'lang' => 'en' });

if ($checker->next_word) {
    say "The string is not valid.";
} else {
    say "The string is valid.";
}

The documentation of the module shows how you could interactively replace erroneous words:

while (my $word = $checker->next_word) {
    print $checker->highlighted_text, 
        "\n", 
        "$word : ",
        (join "\t", @{$checker->suggestions}),
        "\nChoose a new word : ";
    chomp (my $new_word = <STDIN>);
    $checker->replace(new_word => $new_word) if $new_word;
}

If you want to check each word of the input string individually yourself, you could have a look at how Text::SpellCheck splits the string into words (this is done by the next_word function). It uses the following regex:

while ($self->{text} =~ m/\b(\p{L} (?:'\p{L} )?)/g) { 
    ...
}

CodePudding user response:

Following code snippet uses regex which doesn't include letters and ' to split a sentence into a words.

You can extend regex how your heart desires.

use strict;
use warnings;

use Text::Aspell;

my $regex = qr/[^'a-z] /i;
my $input = "This doesn't look too bad. Me&you. with/without. 1..2..3..go!";
my $aspell = Text::Aspell->new();

$aspell->set_option('lang', 'en');

printf "s: %d\n", $_, $aspell->check($_) for split($regex, $input);

Output

        This: 1
     doesn't: 1
        look: 1
         too: 1
         bad: 1
          Me: 1
         you: 1
        with: 1
     without: 1
          go: 1
  • Related