I tried to find an efficient way to split then recombine text in one file into two seperate files. it's got a lot going on like removing the decimal point, reversing the sign ( becomes - and - becomes ) in amount field and padding. For example:
INPUT file input.txt
:
(this first line is there just to give character position more easily instead of counting, it's not present in the input file, the "|" is just there to illustrate position only)
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345
| | | | | | | ("|" shows position)
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
Any line that contains a "1" as the 85th character above goes to one file say OutputA.txt
rearranged like this:
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~ 0000009773 0000009773
As well as any line that contains a "0" as the 85th character above goes to another file OutputB.txt
rearranged like this:
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
It seems so complicated, but if I could just grab each portion of the input lines as different variables and then write them out in a different order with right alignment for the amount padded with 0s and splitting them into different files depending on the last column. Not sure how I can put all these things together in one go.
I tried printing out each line into a different file depending whether the 85th charater is a 1 or 0, then then trying to create variables say from first character to 11th character is varA and the next 10 is varB etc... but it get complex quickly because I need to change to - and - to and then pad with zeros and change te spacing. it gets a bit mad. This should be possible with one script but I just can't put all the pieces together.
I've looked for tutorials but nothing seems to cover grabbing based on condition whilst at the same time padding, rearranging, splitting etc.
Many thanks in advance
CodePudding user response:
split
Use GNU AWK
ability to print
to file, consider following simple example
seq 20 | awk '$1%2==1{print $0 > "fileodd.txt"}$1%2==0{print $0 > "fileeven.txt"}'
which does read output of seq 20
(numbers from 1 to 20, inclusive, each on separate line) and does put odd numbers to fileodd.txt
and even number to fileeven.txt
recombine text
Use substr
and string contatenation for that task, consider following simple example, say you have file.txt
with DD-MM-YYYY dates like so
01-29-2022
01-30-2022
01-31-2022
but you want YYYY-MM-DD then you could do that by
awk '{print substr($0,7,4) "-" substr($0,1,2) "-" substr($0,4,2)}' file.txt
which gives output
2022-01-29
2022-01-30
2022-01-31
substr
arguments are: string ($0
is whole line), start position and length, space is concatenation operator.
removing the decimal point
Use gsub
with second argument set to empty string to delete unwanted characters, but keep in mind .
has special meaning in regular expression, consider following simple example, let file.txt
content be
100.15
200.30
300.45
then
awk '{gsub(/[.]/,"");print}' file.txt
gives output
10015
20030
30045
Observe that /[.]/
not /./
is used and gsub
does change in-place.
reversing the sign(...)padding
Multiply by -1, then use sprintf
with suitable modifier, consider following example, let file.txt
content be
1
-10
100
then
awk '{print "Reversed value is " sprintf("% 05d",-1*$1)}' file.txt
gives output
Reversed value is -0001
Reversed value is 0010
Reversed value is -0100
Explanation: %
- this is place where value will be instered,
- prefix using -
or
, 05
- pad with leading zeros to width of 5 characters, d
assume value is integer. sprintf
does return formatted string which can be concatenated with other string as shown above.
(tested in GNU Awk 5.0.1)
CodePudding user response:
You can use jq for this task:
#!/bin/bash
INPUT='
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
'
convert() {
jq -rR --arg lineSelector "$1" '
def transformNumber($len):
tonumber | # convert string to number
(if . < 0 then " " else "-" end) as $sign | # store inverted sign
if . < 0 then 0 - . else . end | # abs(number)
. * 100 | # number * 100
tostring | # convert number back to string
$sign "0" * ($len - length) .; # indent with leading zeros
# Main program
split(" ") | # split each line by space
map(select(length > 0)) | # remove empty entries
select(.[4] == $lineSelector) | # keep only lines with the selected value in last column
# generate output # example for first line
.[0][11:21] # PPPPPPPPPP
"~~" # ~~
(.[1] | split("#")[0]) # NNNNNN
.[0][0:11] # 123456789XX
"~~~" # ~~~
(.[1] | split("#")[1] | transformNumber(10)) # -0000140458
(.[2] | split("#")[1] | transformNumber(10)) # -0000000000
' <<< "$2"
}
convert 0 "$INPUT" # or convert 1 "$INPUT"
Output for 0
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
Output for 1
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~ 0000009773 0000009773