I have a text file with a bunch of serial numbers and they're supposed to be 16 characters long. But some of the records were damaged and are 13 characters long. I want to add 3 zeros at the beginning of every record that has 13 characters long.
Note: The serial numbers doesn't start at the beginning of the line, they all start at the column 15 of every line.
My file currently looks like this:
1:6822:26: :A:0000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
And the output should be:
1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
This is the code I made to get the records that are shortened:
#!/bin/bash
i=1
for OUTPUT in $*(cut -c15-30 file.txt)
do
if [[ ${#OUTPUT} == 13 ]]
then
echo $OUTPUT
echo $i
i=$((i 1))
fi
done
The txt file has more than 50,000 records so I can't change them manually.
CodePudding user response:
This sed
one-liner should do the job:
sed 's/^\(.\{14\}\)\([0-9]\{13\}[^0-9]\)/\1000\2/' file
This assumes serial numbers consist of decimal digits only and trusts that they all start at the column 15 of every line.
Or, an awk
solution:
awk 'BEGIN { FS=OFS=":" } length($6) == 13 { $6 = "000" $6 } 1 ' file
This one only checks if the length of the sixth field is 13 and trusts that sixth field is the serial number field.
CodePudding user response:
One awk
idea that replaces all of OP's current code:
awk '
BEGIN { FS=OFS=":" } # set input/output field delimiter to ":"
length($6)<16 { $6=sprintf("6d",$6) } # if length of 6th field < 16 then left-pad the field with 0's to length of 16
1 # print current line
' file.txt
This generates:
1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
CodePudding user response:
I took the liberty to tack a :
on ...
$ awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)":"}}1' file.txt
1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
If that's not what you want, use this: awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)}}1' file.txt
CodePudding user response:
Another alternative
awk -v{O,}FS=: '{$6=gensub(" ", "0", "g", sprintf("s", gensub(" ", "", "g", $6)))}1'
result
1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
CodePudding user response:
Because your question it tagged with bash
. As an object of study.
# init array arr
arr=();
# read current row with field separator : from file to array arr
while IFS=":" read -r -a arr rest; do
# remove leading zeros to avoid problem with octal numbers in bash
# and then pad leading zeros
printf -v arr[5] "6d" "${arr[5]## (0)}";
# output array arr with field separator :
for i in "${arr[@]}"; do
printf '%s:' "$i";
done;
printf '\n';
done < file
Output:
1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : : 1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : : 1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : : 1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : : 1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : : 1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :
The tool of choice is certainly awk
.