Home > Software design >  How do I get AWK to reaarrange and manipulate text in a file to two output files depending on condit
How do I get AWK to reaarrange and manipulate text in a file to two output files depending on condit

Time:12-10

I tried to find an efficient way to split then recombine text in one file into two seperate files. it's got a lot going on like removing the decimal point, reversing the sign ( becomes - and - becomes ) in amount field and padding. For example:

INPUT file input.txt:

(this first line is there just to give character position more easily instead of counting, it's not present in the input file, the "|" is just there to illustrate position only)

1234567890123456789012345678901234567890123456789012345678901234567890123456789012345
           |         |             |     |               |                    |     |  ("|" shows position)
123456789XXPPPPPPPPPP              NNNNNN#1404.58        #0.00                0     1
987654321YYQQQQQQQQQQ              NNNNNN#-97.73         #-97.73              1     1
777777777XXGGGGGGGGGG              NNNNNN#115.92         #115.92              0     0
888888888YYHHHHHHHHHH              NNNNNN#3.24           #3.24                1     0

Any line that contains a "1" as the 85th character above goes to one file say OutputA.txt rearranged like this:

PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~ 0000009773 0000009773

As well as any line that contains a "0" as the 85th character above goes to another file OutputB.txt rearranged like this:

GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324

It seems so complicated, but if I could just grab each portion of the input lines as different variables and then write them out in a different order with right alignment for the amount padded with 0s and splitting them into different files depending on the last column. Not sure how I can put all these things together in one go.

I tried printing out each line into a different file depending whether the 85th charater is a 1 or 0, then then trying to create variables say from first character to 11th character is varA and the next 10 is varB etc... but it get complex quickly because I need to change to - and - to and then pad with zeros and change te spacing. it gets a bit mad. This should be possible with one script but I just can't put all the pieces together.

I've looked for tutorials but nothing seems to cover grabbing based on condition whilst at the same time padding, rearranging, splitting etc.

Many thanks in advance

CodePudding user response:

split

Use GNU AWK ability to print to file, consider following simple example

seq 20 | awk '$1%2==1{print $0 > "fileodd.txt"}$1%2==0{print $0 > "fileeven.txt"}'

which does read output of seq 20 (numbers from 1 to 20, inclusive, each on separate line) and does put odd numbers to fileodd.txt and even number to fileeven.txt

recombine text

Use substr and string contatenation for that task, consider following simple example, say you have file.txt with DD-MM-YYYY dates like so

01-29-2022
01-30-2022
01-31-2022

but you want YYYY-MM-DD then you could do that by

awk '{print substr($0,7,4) "-" substr($0,1,2) "-" substr($0,4,2)}' file.txt

which gives output

2022-01-29
2022-01-30
2022-01-31

substr arguments are: string ($0 is whole line), start position and length, space is concatenation operator.

removing the decimal point

Use gsub with second argument set to empty string to delete unwanted characters, but keep in mind . has special meaning in regular expression, consider following simple example, let file.txt content be

100.15
200.30
300.45

then

awk '{gsub(/[.]/,"");print}' file.txt

gives output

10015
20030
30045

Observe that /[.]/ not /./ is used and gsub does change in-place.

reversing the sign(...)padding

Multiply by -1, then use sprintf with suitable modifier, consider following example, let file.txt content be

1
-10
100

then

awk '{print "Reversed value is " sprintf("% 05d",-1*$1)}' file.txt

gives output

Reversed value is -0001
Reversed value is  0010
Reversed value is -0100

Explanation: % - this is place where value will be instered, - prefix using - or , 05 - pad with leading zeros to width of 5 characters, d assume value is integer. sprintf does return formatted string which can be concatenated with other string as shown above.

(tested in GNU Awk 5.0.1)

CodePudding user response:

You can use jq for this task:

#!/bin/bash

INPUT='
123456789XXPPPPPPPPPP              NNNNNN#1404.58        #0.00                0     1
987654321YYQQQQQQQQQQ              NNNNNN#-97.73         #-97.73              1     1
777777777XXGGGGGGGGGG              NNNNNN#115.92         #115.92              0     0
888888888YYHHHHHHHHHH              NNNNNN#3.24           #3.24                1     0
'

convert() {
  jq -rR --arg lineSelector "$1" '
  def transformNumber($len):
    tonumber |                                    # convert string to number
    (if . < 0 then " " else "-" end) as $sign |   # store inverted sign
    if . < 0 then 0 - . else . end |              # abs(number)
    . * 100 |                                     # number * 100
    tostring |                                    # convert number back to string
    $sign   "0" * ($len - length)   .;            # indent with leading zeros

  # Main program
    split(" ") |                                  # split each line by space
    map(select(length > 0)) |                     # remove empty entries
    select(.[4] == $lineSelector) |               # keep only lines with the selected value in last column

    # generate output                               # example for first line
    .[0][11:21]                                     # PPPPPPPPPP
    "~~"                                            # ~~
    (.[1] | split("#")[0])                          # NNNNNN
    .[0][0:11]                                      # 123456789XX
    "~~~"                                           # ~~~
    (.[1] | split("#")[1] | transformNumber(10))    # -0000140458
    (.[2] | split("#")[1] | transformNumber(10))    # -0000000000
  ' <<< "$2"
}

convert 0 "$INPUT"   # or convert 1 "$INPUT"

Output for 0

GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324

Output for 1

PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~ 0000009773 0000009773
  • Related