Seen tons of examples but I cannot seem to get any to work in this script from https://stackoverflow.com/a/72720612
by another user @Just Khaithang on this site and it works great but I need to retain my column spacing as well since it is critical.
This is the .txt file sample as I have posted here a couple times. There is 1 space at the beginning and 20 spaces from the beginning of column 1 to the beginning of column 2 and 4 spaces in between 2 and 3. see below for the script. The outcome changes a date from user input thus using the variable $broken_date
. This script is called from another shell script with awk -v
. The "" spaces in between work but since column 1 varies it is not staying aligned.
146327A 0000000020220422 000002012633825-0003-1
137149D 0000000045220419 000004512632587-0003-0
137050C 0000000018220419 000001812632410-0003-0
137147A 0000000045220419 000004512632487-0003-0
137233B 0000000144220421 000014412630711-0003-1
137599B 0000000120220419 000012012632543-0003-0
137604D 0000000015220419 000001512632588-0003-0
151031-001E 0000000041220517 000004112575320-0003-1
151248-001A 0000000021220421 000002112629944-0003-1
151249-001A 0000000005220422 000000512634524-0003-1
151827-002B 0000000040220421 000004012629223-0003-1
127941B 0000000045220422 000004512634676-0003-1
137105A 0000000020220421 000002012630791-0003-1
132136A 0000000005220419 000000512632590-0003-0
132137A 0000000005220419 000000512632591-0003-0
134180D 0000000052220419 000006012622399-0003-1
134307-004K 0000000016220420 000001612635621-0003-0
141014-001B 0000000040220419 000004012632585-0003-0
{
c2=$2
c3=$3
sub("0 ","",c2)
sub("0 ","",c3)
sub("-.*","",c3)
if (length(c2) == 8) {
c2_value=substr(c2,1,2)
} else if (length(c2) == 9) {
c2_value=substr(c2,1,3)
}
if (length(c3) == 10) {
c3_value=substr(c3,1,2)
} else if (length(c3) == 11) {
c3_value=substr(c3,1,3)
}
if(c2_value != c3_value) {
sub("[1-9].*$","",$2)
date="$broken_date" # this value taken from user input
print $1" "$2 c2_value broken_date" "$3
} else {
print $0
}
}
Output should be
146327A 0000000020220422 000002012633825-0003-1
137149D 0000000045220419 000004512632587-0003-0
137050C 0000000018220419 000001812632410-0003-0
137147A 0000000045220419 000004512632487-0003-0
137233B 0000000144220421 000014412630711-0003-1
137599B 0000000120220419 000012012632543-0003-0
137604D 0000000015220419 000001512632588-0003-0
151031-001E 0000000041220517 000004112575320-0003-1
151248-001A 0000000021220421 000002112629944-0003-1
151249-001A 0000000005220422 000000512634524-0003-1
151827-002B 0000000040220421 000004012629223-0003-1
127941B 0000000045220422 000004512634676-0003-1
137105A 0000000020220421 000002012630791-0003-1
132136A 0000000005220419 000000512632590-0003-0
132137A 0000000005220419 000000512632591-0003-0
134180D 0000000052220909 000006012622399-0003-1
134307-004K 0000000016220420 000001612635621-0003-0
141014-001B 0000000040220419 000004012632585-0003-0
The only difference is in the date but that is what it needs to do on the 3rd line from the bottom 2nd column where I entered 220909.
I am doing this in a Korn shell via MKS Toolkit; Awk says file version 9.2.3.2096. This is on an old Windows XP machine.
CodePudding user response:
This will behave the same way using any awk:
$ cat tst.sh
#!/usr/bin/env bash
broken_date='220909'
awk -v broken_date="$broken_date" '
substr($2,4,7) != substr($3,1,7) {
tail = $0
nf = 0
while ( tail != "" ) {
match(tail,/^[ \t]*/)
sep[ nf] = substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART RLENGTH)
match(tail,/^[^ \t]*/)
fld[nf] = substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART RLENGTH)
}
fld[2] = substr(fld[2],1,10) broken_date
$0 = ""
for ( i=1; i<=nf; i ) {
$0 = $0 sep[i] fld[i]
}
}
{ print }
' "${@:--}"
$ ./tst.sh file
146327A 0000000020220422 000002012633825-0003-1
137149D 0000000045220419 000004512632587-0003-0
137050C 0000000018220419 000001812632410-0003-0
137147A 0000000045220419 000004512632487-0003-0
137233B 0000000144220421 000014412630711-0003-1
137599B 0000000120220419 000012012632543-0003-0
137604D 0000000015220419 000001512632588-0003-0
151031-001E 0000000041220517 000004112575320-0003-1
151248-001A 0000000021220421 000002112629944-0003-1
151249-001A 0000000005220422 000000512634524-0003-1
151827-002B 0000000040220421 000004012629223-0003-1
127941B 0000000045220422 000004512634676-0003-1
137105A 0000000020220421 000002012630791-0003-1
132136A 0000000005220419 000000512632590-0003-0
132137A 0000000005220419 000000512632591-0003-0
134180D 0000000052220909 000006012622399-0003-1
134307-004K 0000000016220420 000001612635621-0003-0
141014-001B 0000000040220419 000004012632585-0003-0
It just retains whatever spacing you already have. I made the script more general than necessary so you can see how to break an input record into arrays of separators (sep[]
) and fields (fld[]
) so you can do whatever you like with similar problems in future.
CodePudding user response:
Assumptions:
GNU awk/FIELDWIDTHS
is available to OP (in comments OP mentions not able to getFIELDWIDTHS
to work which I take to mean that OP is runningGNU awk
otherwise I'd expect OP to state something about an error orFIELDSWITHS
not available)- input field widths are known in advance (eg, all inputs have the same spacing)
One idea for modifying OP's current code to work with GNU awk/FIELDWIDTHS
:
broken_date='220909'
awk -v bdate="${broken_date}" '
BEGIN { FIELDWIDTHS="21 20 100"
fmt="%-21s%-20s%s\n" # define our printf format to match FIELDSWIDTHS
}
{ c2=$2; gsub(/ /,"",c2); sub("0 ","",c2)
c3=$3; gsub(/ /,"",c3); sub("0 ","",c3); sub("-.*","",c3)
if (length(c2) == 8) { c2_value=substr(c2,1,2) }
else if (length(c2) == 9) { c2_value=substr(c2,1,3) }
if (length(c3) == 10) { c3_value=substr(c3,1,2) }
else if (length(c3) == 11) { c3_value=substr(c3,1,3) }
if (c2_value != c3_value) { printf fmt,$1,substr($2,1,length(gensub(/ /,"","g",$2))-6) bdate,$3 }
else { print $0 }
}
' x > y
This generates:
$ diff x y
16c16
< 134180D 0000000052220419 000006012622399-0003-1
---
> 134180D 0000000052220909 000006012622399-0003-1