Home > Net >  PERL - work with txt files, and extracting the data in different variables
PERL - work with txt files, and extracting the data in different variables

Time:09-27

I need to work with .txt files, and filter by name and date stored at the name of the file.

At the moment I achived the following:

my $dir = "t-files\/";
chdir($dir);
foreach $files (glob('*.txt')) {
  ($sname) = split(/_/, $files);
  #($sdate) = "still under work"
  print "\nSwitch Name: $sname - Date: still under work";
}

File example names: "s-ar-ar55g-1_20140911-09.txt" | "s-ar-ar55g-1_20141027-09.txt" | etc.

With this script I have the following output:

D:\_perl>test_01.pl

Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
Switch Name: s-ar-ar55g-1 - Date: still under work
D:\_perl>

My intention is to extract the date string "20140911" from the file, and stored into a new variable "sdate"

By this way I need to have two variables, so I be able to make comparition with name and date

Is it posible to extract the year, month and day like this "20140911", directly from the name of the txt file?

CodePudding user response:

Can always parse a string like this with an easy regex

my $file = 's-ar-ar55g-1_20140911-09.txt';

my ($sname, $date) = $file =~ /( [^_]  ) _ ( [0-9]{8} )/x;

The /x modifier makes it ignore spaces (and newlines, and honors comments with #) in patterns, so that we can make it more readable. As for patterns, in the character class [] I use negation, [^_], which matches any character other than the _, and the following means that there must be at least one such character. So that matches a string of characters up to the first _.

This is captured, because of surrounding (), and so is the pattern for a number which must repeat 8 times, [0-9]{8}. The two captured patterns are returned, and assigned to $sname and $date. See tutorial perlretut for starters, or your favorite good Perl book.

Note that I declare my $sname, and all other variables as they get introduced. This can be enforced by strict pragma, and you must always enable warnings as well of course.


The split you use is a great tool to reach for, but there is a little more to do with it here

my ($sname, $date) = split /_/, $file;  
# Now need to remove the trailing `-1.txt` from $date
($date) = split /-/, $date, 2;
# or, with a regex
# $date =~ s/[^-] \K.*//;  # remove the first - and all after it

That third argument in the second split, the 2, tells split to return two elements altogether. So that'll be what's before the first - and then a string with everything after it.

We need () around $date to enforce list context otherwise it would be assigned the number of elements of the returned list (2).

Clearly a bit more work and consideration than the basic regex.

Another way, to push this argument further, would be to split on either _ or - and then assemble needed parts

my @parts = split /[_-]/, $file;
my ($sname, $date) = ( join('-', @parts[0..3]), $parts[4] );

Now we also have that @parts variable floating around, supposedly unneeded, so let's avoid that namespace pollution

my ($sname, $date) = do {
    my @parts = split /[_-]/, $file;
    join('-', @parts[0..3]), $parts[4];
};

(Now @parts, being declared as lexical my inside that do block, does not exist outside of it.)

This is a standard way to work with a string when parts of it need analyzes and processing but it is clearly an overkill here.

CodePudding user response:

Following code snippet utilizes regex to extract/capture from a filename 4 parts: anything before underscore, year (first 4 digits), month (next 2 digits), day of month (next 2 digits) -- for sanity check expects dash with following 2 digits, dot and txt as file's extension.

The output joins date parts with / for demonstration purpose only.

Note: replace while( <DATA> ) { with for ( glob('s-ar-*.txt') ) { to get a list of text files matching file mask in filesystem.

use strict;
use warnings;
use feature 'say';

while( <DATA> ) {
    /([^_]*)_(\d{4})(\d{2})(\d{2})-\d{2}\.txt/;
    my($switch,$year,$month,$mday) = ($1,$2,$3,$4);
    say "Switch name: $switch - Date: " . join('/',$year,$month,$mday);
}


__DATA__
s-ar-ar55g-1_20140911-09.txt
s-ar-ar55g-1_20141027-09.txt

Output

Switch name: s-ar-ar55g-1 - Date: 2014/09/11
Switch name: s-ar-ar55g-1 - Date: 2014/10/27

Reference: Perl regular expression

  •  Tags:  
  • perl
  • Related