I have an ispell huge .mwl file and I want to remove all the ispell suffixes to generate a simple text-only words dictionnary
using unix ispell, bash or perl commands.
Is there ispell command options to do that?
(in unix, the .mwl.gz files are located in the /usr/share/ispell/ directory)
a short extract non exhaustive of the file:
a/MRSY
A'asia
a'body
a'thing
aaa
AAAS
Aaberg/M
Aachen/M
Aaedon/M
AAeE
AAeE's
aaerially
aaerialness
Aaerope/M
AAgr/M
aah/DGS
aal/MS
Aalborg
Aalesund
aalii/MS
Aaliyah/M
Aalst/M
Aalto
aam
Aandahl/M
Aani/M
Aaqbiye/M
Aar/MN
Aara/M
Aarau
aardvark/MS
aardwolf/M
aardwolves
Aaren/M
Aargau
aargh
Aarhus
Aarika/M
aarogramme
CodePudding user response:
I'm not sure what you mean by suffix but I'll assume it's the part following the / or ' in your sample text. You can do this with a simple pipeline from Bash.
cat something.mwl | perl -pe 's{[/\x27].*$}{}; ' > stripped_something.txt
The -p
switch means to run perl in a pipeline. Whatever you pipe in will be put into $_
one line at a time, worked on, and then printed out. Notice I put \x27
for the apostrophe in the regex. Escaping it in the command line is a big pain. If there are any other characters that start a suffix you can put them in the character class.
You can do any other work on the line before printing it out this way too.
See the perlrun documentation for more about the -p
switch.