I'm new to perl and I would like to lean how to use loops with it. I have multiple directories and each directory contain a file named data.txt. The data.txt file has several columns. I basically need to use a loop to calculate the mean of each column for each data.txt file.
I have this command that does the job for one single file:
perl -lane 'for $c (0..$#F){$t[$c] = $F[$c]}; END{for $c (0..$#t){print $t[$c]/$.}}' data.txt`
I wish to write a script where I visit every directory, read every file that's in it and apply the command.
Example:
data.txt:
-79.2335 0.4041 71.9143 1.3392 -0.7687 0.0212 -8.0934 1.1425
-74.4163 0.6188 60.0468 1.8782 -0.8540 0.0305 -15.1574 1.4755
-74.4118 0.6046 62.1771 1.8058 -0.9143 0.0304 -13.2272 1.3408
-74.3895 0.5935 66.4264 1.6532 -0.8509 0.0223 -8.8819 1.2670
-74.3192 0.5589 67.1619 1.4763 -0.9656 0.0274 -8.1090 1.1450
-73.8272 0.6274 61.6632 1.7554 -0.8840 0.0256 -13.0435 1.3641
-73.3525 0.5856 60.6622 1.7872 -0.8489 0.0222 -13.5014 1.3947
-73.3206 0.6275 53.3129 2.2961 -0.7962 0.0337 -20.8195 1.8538
-72.5461 0.5212 62.0359 1.4267 -0.9378 0.0240 -11.4203 1.0295
-72.3058 0.7225 56.2304 2.1480 -0.7539 0.0293 -16.7954 1.5952
-72.1180 0.6460 51.7954 2.0845 -0.8479 0.0265 -21.0355 1.4630
-72.0690 0.4905 58.8372 1.3918 -0.9866 0.0333 -14.1823 1.1045
-71.7949 0.5799 55.6006 1.9189 -0.8541 0.0313 -17.0112 1.4530
-71.3074 0.4482 45.9271 2.1135 -0.6637 0.0354 -25.9309 1.8761
-71.2542 0.4879 57.3196 1.5406 -0.9523 0.0281 -14.9113 1.2705
-71.2421 0.5480 47.9065 2.2445 -0.8107 0.0352 -24.2489 1.7997
-70.3751 0.5278 49.5489 1.8395 -0.8208 0.0371 -21.5205 1.4994
-69.2181 0.4823 54.8234 1.0645 -0.9897 0.0246 -15.3506 0.9369
-69.0456 0.4650 40.3798 2.0117 -0.6476 0.0360 -29.3403 1.7013
-66.5402 0.5006 42.1805 1.7872 -0.7692 0.0356 -25.1431 1.4522
Output:
-72.354355 0.552015 56.297505 1.77814 -0.845845 0.029485 -16.88618 1.408235
CodePudding user response:
As your comments imply that you have a simple directory structure with one main directory called mean
with 100s of subdirectories, each with a file called data.txt
, the list of files can be compiled easily with a glob, and the math is fairly straightforward. This is a suggestion how it can be done.
I would not use $.
as a way to calculate the average, since it can be corrupted by other factors. But just use a count variable for each file, and count the non-blank lines.
use strict;
use warnings;
use feature 'say';
for my $data (glob "mean/*/data.txt") { # get list of files
open my $fh, '<', $data or die "Cannot open file '$data': $!";
my @sum;
my $count = 0;
while (<$fh>) {
$count if /\S/; # count non-blank lines
my @fields = split; # split on whitespace
for (0 .. $#fields) {
$sum[$_] = $fields[$_]; # sum columns
}
}
say $data; # file name
say join "\t", # 3. ...join them with tab and print
map $_/$count, # 2. ...for each sum, divide by count
@sum; # 1. Take list of sums...
}
Output:
mean/A/data.txt
-72.354355 0.552015 56.297505 1.77814 -0.845845 0.029485 -16.88618 1.408235
mean/B/data.txt
-142.354355 0.552015 56.297505 1.77814 -0.845845 0.029485 -16.88618 1.408235
mean/C/data.txt
-72.354355 17.152015 56.297505 1.77814 -0.845845 0.029485 -16.88618 1.408235
CodePudding user response:
I am not a Perl expert but this worked for me. It prints the results to terminal. Or you could redirect it to a file if you want or directly write to a file instead of printing to terminal.
use 5.28.2;
use warnings;
use File::Find;
my ($inf, @sum);
for my $dir (glob "/mainDirectory/*"){ # finds files/subdirectories
if (! -d $dir) {
next; # keeps only directories
}
$inf= "$dir/data.txt";
say "$inf";
find(\&sum_columns, $inf);
}
sub sum_columns{
open (IN, "<", "$inf" ) or die "Cannot open file.\n $!";
while (<IN>){
my $line = $_;
chomp $line;
my @columns = split(/\s /,$line);
for my $item (0 .. $#columns){
$sum[$item] = $columns[$item];
}
}
say "@sum";
@sum=();
}