Home > OS >  Sorting upper case and lower case WITH PERL
Sorting upper case and lower case WITH PERL

Time:12-02

I tried to sorting upper case and lower case in the perl language. A bunch of text are save in as "electricity.txt" in the .txt file:

Today's scientific question is: What in the world is electricity and where does it go after it leaves the toaster?

Here is a simple experiment that will teach you an important electrical lesson: On a cool dry day, scuff your feet along a carpet, then reach your hand into a friend's mouth and touch one of his dental fillings. Did you notice how your friend twitched violently and cried out in pain? This teaches one that electricity can be a very powerful force, but we must never use it to hurt others unless we need to learn an important lesson about electricity.

Somehow, I CAN'T GET ANY UPPERCASE WORD and my code is

my %count;
my $openFileile = "electricity.txt";
open my $openFile, '<', $openFileile;
while (my $list = <$openFile>) {
  chomp $list;
  foreach my $word (split /\s /, $list) {
    $count{lc($word)}  ;
  }
}
printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
foreach my $word (sort keys %count){
  printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);

}

CodePudding user response:

As Hunter McMillen's comment says, you are using lc on the words when creating the hash, therefore all of your original capitalization will be lost. Lets go through your code, as I spot some other mistakes.

First off, always use use strict; use warnings. Especially if you have a preference for long and complicated variable names. It will save you from typos and weird bugs.

open my $openFile, '<', $openFileile;

With open statements, it is idiomatic to check the return value of the open, to see if anything went wrong. And if it did, to report the error. I.e. add ..., or die "Cannot open '$openFileile': $!".

  foreach my $word (split /\s /, $list) {

Typically, if you split on whitespace you usually want to split on ' ' -- a single space. This is a special case for split, also the default split mode, it will split on \s , but also remove leading whitespace.

    $count{lc($word)}  ;

Here is your problem. All the words lose their original case.

printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";

printf is a special formatting print. If you do not intend to use that formatting, use the regular print to avoid problems.

  printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);
  1. You cannot sort just one (1) word. You need at least 2 words to be able to sort.
  2. Why are you using double quotes, and then escaping the variable sigil? I am guessing this is you testing different things to see what works. This looks very unlikely to do what you want. "\$a" will just become $a -- a dollar sign plus an "a". This is what you do when you want to print the variable name, e.g. print "\$a is $a" (prints $a is 12, for example).
  3. lc will have no effect, since all your words are already in lower case.
  4. Even if lc and uc would work here, you cannot use uc like that in the sort subroutine. The sort function will choose one word in the comparison at random and capitalize it. Effectively destroying your sort.

Also uc will change all the letters to upper case (cat => CAT). You want ucfirst (cat => Cat).

When I clean up your code, and also make the variable names somewhat more reasonable, I get this below. Also, I removed your file open, since I use the internal DATA file handle to facilitate testing. You can just put back your own open, with the additions I described above.

use strict;
use warnings;

my %words;
while (my $line = <DATA>) {
    for my $word (split ' ', $line) { # split on ' ' a single space removes leading and trailing whitespace
        my $key = lc $word;           # save lowercase word as key 
        $words{$key}{count}  ;        # separate count 
        $words{$key}{value} = $word;  # word original formatting as value
    }
}

# printf is used for special formatting, if you are not using that formatting, use regular print to avoid unnecessary interpolation of %
print "\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
for my $word (sort keys %words) {
    printf "%-31s : %s\n", $words{$word}{value}, $words{$word}{count};
}

__DATA__
Today's scientific question is: What in the world is electricity and where does it go after it leaves the toaster?
Here is a simple experiment that will teach you an important electrical lesson: On a cool dry day, scuff your feet along a carpet, then reach your hand into a friend's mouth and touch one of his dental fillings. Did you notice how your friend twitched violently and cried out in pain? This teaches one that electricity can be a very powerful force, but we must never use it to hurt others unless we need to learn an important lesson about electricity.

And it prints

a                               : 5
about                           : 1
after                           : 1
along                           : 1
an                              : 2
and                             : 3
be                              : 1
but                             : 1
can                             : 1
carpet,                         : 1
cool                            : 1
...etc

As can be noticed, this differentiates between carpet and carpet, since you are only splitting on whitespace. It keeps the non-word characters and includes them in the hash. There are different ways to find words in a text. For example, instead of split you could use a regex:

my @words = $line =~ /\w /g;   # \w is word characters, plus numbers, and underscore _

Even this is simplistic, but will work better than your split. You can add characters to the regex as your needs require, for example: /[\w\-] / -- include dash for hyphenated words, e.g. mega-carpet. (Note that dash - has to be escaped when placed between other characters inside a character class bracket, otherwise it will be interpreted as a range, e.g. a-z.)

CodePudding user response:

Issue 1

First problem is the statement below means you are only storing the lower-case versions of all the words

$count{lc($word)}  ;

After the initial while loop %count has only lower-case words. That means your foreach loop can never retrieve the upper-case words.

Issue 2

Second issue is this statement

printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);

I have no idea what you think that the sort will achieve -- it is sorting a list with only one element, lc($word), so doesn't actually do anything.

A working example

Taking the comments above into account, here is a version that outputs both upper & lower-case words (abbreviated)

use strict;
use warnings;

my %count;
#my $openFileile = "electricity.txt";
#open my $openFile, '<', $openFileile;
while (my $list = <DATA>) {
  chomp $list;
  foreach my $word (split /\s /, $list) {
    $count{$word}  ;
  }
}

printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
foreach my $word (sort keys %count){
  printf "%-31s \n", $word;

}
__DATA__
Today's scientific question is: What in the world is electricity and where does it go after it leaves the toaster?

Here is a simple experiment that will teach you an important electrical lesson: On a cool dry day, scuff your feet along a carpet, then reach your hand into a friend's mouth and touch one of his dental fillings. Did you notice how your friend twitched violently and cried out in pain? This teaches one that electricity can be a very powerful force, but we must never use it to hurt others unless we need to learn an important lesson about electricity.

That print this

Sorting Alphabetically with upper case words in front of lower-case words with the same initial characters
Did                             
Here                            
On                              
This                            
Today's                         
What                            
a                               
about                           
after                           
along    
...
use                             
very                            
violently                       
we                              
where                           
will                            
world                           
you  

  •  Tags:  
  • perl
  • Related