Home > Net >  How to cut , sum each line and reshape data with bash (read input to which file)
How to cut , sum each line and reshape data with bash (read input to which file)

Time:07-30

I have a file "inp" containing the names of data files I want to process (~200). Part of the file is:

vt-3.00-0.02-80-160k-sphere.out
vt-3.00-0.04-80-160k-sphere.out
vt-3.00-0.06-80-160k-sphere.out
vt-3.00-0.08-80-160k-sphere.out
vt-3.00-0.10-80-160k-sphere.out

Each datafile contains repeating blocks of data, and I want to manipulate a certain part of it:

An example of my repeating data-block:

ST 80 102089
NA 25344 156787 373708 161510 159938 1.235805 9277 482431400
NB 7108 14482 40970 83521 158694
MV 94.577381 337.419276 1.015652 7.253154
VT 1 181 9 7 29 62 580
VT 2 29 12 12 486 13 341
VT 3 486 7 5 59 156 256
VT 4 59 5 5 1199 22 1229
VT 5 1199 10 18 725 379 777
VT 6 725 14 24 587 225 1230
VT 7 587 18 10 183 187 799
VT 8 183 5 5 179 63 629
VT 9 179 5 5 8988 61 9277
VT 10 8988 5 7 1310 2824 1329
VT 11 1310 7 7 2317 417 9088
VT 12 2317 11 5 774 729 1330
VT 13 774 9 9 58 248 2089
VT 14 58 9 5 1949 22 2033
VT 15 1949 7 7 324 607 295
VT 16 324 5 17 3419 106 3097
VT 17 3419 11 13 364 1067 544
VT 18 364 5 5 108 118 1913
VT 19 108 9 7 364 35 427
VT 20 364 23 15 545 116 458
VT 21 545 27 17 727 172 591
VT 22 727 20 16 254 229 629
VT 23 254 5 7 520 84 837
VT 24 520 9 9 3268 165 3323
VT 25 3268 7 5 11 1028 552
VT 26 11 6 14 573 7 3143
VT 27 573 7 13 1492 185 1583
VT 28 1492 6 8 48 467 675
VT 29 48 9 5 48 19 1529
VT 30 48 13 35 2347 19 2164
VT 31 2347 5 13 111 743 318
VT 32 111 6 16 491 39 2451
VT 33 491 11 7 47 154 160
VT 34 47 7 9 1679 18 1724
VT 35 1679 11 7 3650 533 3741
VT 36 3650 6 12 568 1152 1742
VT 37 568 7 11 975 182 3765
VT 38 975 5 5 17 307 602
VT 39 17 6 8 953 9 990
VT 40 953 8 12 439 303 474
VT 41 439 13 9 2183 141 2215
VT 42 2183 5 5 17 684 450
VT 43 17 13 9 184 9 2205
VT 44 184 5 7 65 61 170
VT 45 65 5 7 1136 24 848
VT 46 1136 5 5 2056 360 2071
VT 47 2056 6 8 625 643 1155
VT 48 625 27 17 1658 203 2097
VT 49 1658 8 14 385 529 744
VT 50 385 5 9 618 120 1987
VT 51 618 5 5 439 198 582
VT 52 439 7 5 284 140 631
VT 53 284 7 5 1225 93 1244
VT 54 1225 9 5 1024 391 1129
VT 55 1024 5 15 2504 326 2515
VT 56 2504 5 5 338 794 1040
VT 57 338 6 6 3299 107 3315
VT 58 3299 12 10 682 1042 783
VT 59 682 14 16 441 220 3241
VT 60 441 9 17 132 144 709
VT 61 132 15 7 2941 46 3054
VT 62 2941 13 9 492 926 536
VT 63 492 16 26 2750 159 2964
VT 64 2750 13 5 528 858 761
VT 65 528 12 16 748 166 2588
VT 66 748 5 5 1508 238 1547
VT 67 1508 5 5 153 477 799
VT 68 153 6 10 184 51 1535
VT 69 184 18 8 355 60 458
VT 70 355 8 10 668 113 644
VT 71 668 12 12 919 215 1043
VT 72 919 8 14 489 293 688
VT 73 489 8 10 619 156 1025
VT 74 619 14 10 21 198 523
VT 75 21 18 24 1514 10 1789
VT 76 1514 16 12 3792 477 3803
VT 77 3792 9 5 36 1188 1585
VT 78 36 9 9 74 15 3896
VT 79 74 7 9 517 27 343
VT 80 517 14 6 181 170 202

I'm interested only in those lines which start with VT. I want to omit the first number "1...80" after vt, and sum the following 4 numbers, then place all the 80 "summed" values in one line separated by space. I ended up with the following code after some search:

#!/bin/sh

while read f ; do
  fred=${f}.red
  #fred=`echo "$f" | sed 's/...$/red/'`
  N=1600000
  echo "Processing $f file to ${fred} ..."
  grep "^VT" $f | tail -$N | cut -d ' ' -f 3,4,5,6 | sed -e 's/\([0-9.]\) /\1 /g' -e 's/.*/(&)/' | bc | tr -d ',' | paste -d' ' $(printf -- '- %.s' {1..80}) > ${fred}
done < inp

So while it can read the file "inp" read those listed filenames one by one, each file in progress is named "$f". For $f" grep all the lines starting with VT, take the last N lines (20k * 80), cut out the 3rd...6th numbers, transform them into (n1 n2 n3 n4), use bc to sum them. Then print 80 times '-' and replace it with the following 80 numbers (this is a funny trick actually)

It seems to work if I use an individual file in the terminal, but in this loop, it doesn't do the last part (reshaping).

Expected output (created via my code performed on an individual file) with tail -166:

1480 5223 3839 135 742 886 280 533 537 1318 1951 1312 831 370 9040 10260 3820 3282 898 2099 2402 3557 3584 450 479 948 1370 1148 844 3917 3414 541 1892 1430 117 2335 2367 634 613 1690 5331 4278 1530 975 929 1331 2654 2245 251 338 1278 3161 2604 2381 2094 914 1164 856 1597 2317 3603 2946 3661 4082 1175 611 3044 3376 3133 3142 1362 2319 1700 328 520 1039 1647 1569 1209 636
1489 5179 3780 135 748 858 226 539 557 1268 1952 1350 798 372 9177 10310 3641 3107 850 2021 2287 3765 3807 482 488 947 1316 1017 786 3806 3291 604 2085 1554 110 2443 2476 624 556 1742 5347 4236 1561 1002 984 1412 2644 2210 223 261 1213 3202 2695 2327 2065 1017 1067 735 1521 2263 3548 2852 3649 4003 1153 599 3095 3455 3284 3296 1304 2266 1671 353 565 1041 1611 1430 1126 664

Resulting output (using the loop over all files):

1480
5223
3839
135
742
886
280
533
537
1318
1951
1312
831
370
9040
10260
3820
3282
898
2099
2402
3557
3584
450
479
948
1370
1148
844
3917
3414
541
1892
1430
117
2335
2367
634
613
1690
5331
4278
1530
975
929
1331
2654
2245
251
338
1278
3161
2604
2381
2094
914
1164
856
1597
2317
3603
2946
3661
4082
1175
611
3044
3376
3133
3142
1362
2319
1700
328
520
1039
1647
1569
1209
636
1489
5179
3780
135
748
858
226
539
557
1268
1952
1350
798
372
9177
10310
3641
3107
850
2021
2287
3765
3807
482
488
947
1316
1017
786
3806
3291
604
2085
1554
110
2443
2476
624
556
1742
5347
4236
1561
1002
984
1412
2644
2210
223
261
1213
3202
2695
2327
2065
1017
1067
735
1521
2263
3548
2852
3649
4003
1153
599
3095
3455
3284
3296
1304
2266
1671
353
565
1041
1611
1430
1126
664
1577
5334
3842
128
607
718
       

                                                               

My question:

  • Is there any more efficient way to do this sum reshape that I am looking for?
  • Why does it not work for the loop properly?

CodePudding user response:

tested your code on Ubuntu server it works but I using bash not sh so try to change

#!/bin/sh 

with

#!/bin/bash 

using sh on my server I get same messy result as your.

In Ubuntu, sh or /bin/sh just points to dash (Debian Almquist shell). sh is supposed to run default command interpreter, which is dash for Ubuntu.

here is a suggestion (not exactly equals as your) in awk:

/^VT/ { 
    if (lines == "" ) { 
       lines = $3 $4 $5 $6; 
    } else { 
       lines= lines " "  $3 $4 $5 $6; 
    } 
    if (NR%nlineconcat == 0) { 
       print lines; 
       lines = ""; 
    }  
} 
END { 
      print lines; 
}

Put this code in file program.awk and try:

awk -vnlineconcat=4 -f program.awk *.out

where nlineconcat is the number of line result that will be concatenated in one line.

Or you can run it in one line:

awk -vnlineconcat=4 '/^VT/ { if (lines == "" ) { lines = $3 $4 $5 $6; } else { lines= lines " "  $3 $4 $5 $6; } if (NR%nlineconcat == 0) { print lines; lines = ""; }  } END { print lines; }' *.out

awk is powerful to manipulate text file.

  • Related