I have two files. File1 contains list of email addresses. File2 contains list of domains.
I want to filter out all the email addresses after matching exact domain using Perl script.
I am using below code, but I don't get correct result.
#!/usr/bin/perl
#use strict;
#use warnings;
use feature 'say';
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
my $match = open(MATCH, ">matching_domain") || die;
open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my @wrd = <$data1>;
chomp @wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
chomp $line;
foreach (@wrd) {
if($line =~ /\@$_$/) {
print MATCH "$line\n";
}
}
}
File1
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
File2
yahoo.com
gmail.com
Expected output
[email protected]
[email protected]
CodePudding user response:
First off, since you seem to be on *nix, you might want to check out grep -f
, which can take search patterns from a given file. I'm no expert in grep
, but I would try the file and "match whole words" and this should be fairly easy.
Second: Your Perl code can be improved, but it works as expected. If you put the emails and domains in the files as indicated by your code. It may be that you have mixed the files up.
If I run your code, fixing only the paths, and keeping the domains in file1, it does create the file matching_domain
and it contains your expected output:
[email protected]
[email protected]
So I don't know what you think your problem is (because you did not say). Maybe you were expecting it to print output to the terminal. Either way, it does work, but there are things to fix.
#use strict;
#use warnings;
It is a huge mistake to remove these two. Biggest mistake you will ever do while coding Perl. It will not remove your errors, just hide them. You will spend 10 times as much time bug fixing. Uncomment this as your first thing you do to fix this.
use feature 'say';
You never use this. You could for example replace print MATCH "$line\n"
with say MATCH $line
, which is slightly more concise.
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
This is very incorrect. You are placing a condition on the creation of a variable. If the condition fails, does the variable exist? Don't do this. I assume this is to check if the file exists, but that is not what this does. To check if a file exists, you can use -e
, documented as perldoc "-X"
(various file tests).
Furthermore, a statement in the form of a string, "/home/user..."
is TRUE ("truthy"), as far as Perl conditions are concerned. It is only false if it is "0"
(zero), ""
(empty) or undef
(undefined). So your or
clause will never be executed. E.g. "foo" or die
will never die.
Lastly, this test is quite meaningless, as you will be testing this in your open
statement later on anyway. If the file does not exist, the open will fail and your program will die
.
my $match = open(MATCH, ">matching_domain") || die;
This is also very incorrect. First off, you never use the $match
variable. Secondly, I bet it does not contain what you think it does. (it contains a boolean which states whether open
was successful or not, see perldoc -f open) Thirdly, again, don't put conditions on my
declarations of variables, it is a bad idea.
What this statement really means is that $match
will contain either the return value of the open
, or the return value of die
. This should probably be simply:
open my $match, ">", "matching_domain" or die "Cannot open '$match': $!;
Also, use the three argument open
with explicit open MODE, and use lexical file handles, like you have done elsewhere.
And one more thing on top of all the stuff I've already badgered you with: I don't recommend hard coding output files for small programs like this. If you want to redirect the output, use shell redirection: perl foo.pl > output.txt
. I think this is what has prompted you to think something is wrong with your code: You don't see the output.
Other than that, your code is fine, as near as I can tell. You may want to chomp
the lines from the domain file, but it should not matter. Also remember that indentation is a good thing, and it helps you read your code. I mentioned this in a comment, but it was removed for some reason. It is important though.
Good luck!
CodePudding user response:
This assumes that the lines labeled File1
are in the file pointed to by $file1
and the lines labeled File2
are in the file pointed to by $file2
.
You have your variables swapped. You want to match what is in $line
against $_
, not the other way around:
# loop on the file to be searched
open( my $data2, '<', $file2 ) or die "Could not open '$file2' $!\n";
while ( my $line = <$data2> ) {
chomp $line;
foreach (@wrd) {
if (/\@$line$/) {
print MATCH "$_\n";
}
}
}
You should un-comment the warnings
and strict
lines:
use strict;
use warnings;
warnings
shows you that the or die
checks are not really working the way you intended in the file name assignment statements. Just use :
my $file1 = "/home/user/domain_file";
my $file2 = "/home/user/email_address_file";
You are already doing the checks where they belong (on open
).