How can I append a string on a specific column on the lines that match a condition on a txt file usi-CodePudding

I have a text file with a bunch of serial numbers and they're supposed to be 16 characters long. But some of the records were damaged and are 13 characters long. I want to add 3 zeros at the beginning of every record that has 13 characters long.

Note: The serial numbers doesn't start at the beginning of the line, they all start at the column 15 of every line.

My file currently looks like this:

1:6822:26: :A:0000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

And the output should be:

1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

This is the code I made to get the records that are shortened:

    #!/bin/bash
    i=1
    for OUTPUT in $*(cut -c15-30 file.txt)
    do
       if [[ ${#OUTPUT} == 13 ]]
       then 
              echo $OUTPUT
              echo $i
              i=$((i 1))
    
       fi
    done

The txt file has more than 50,000 records so I can't change them manually.

CodePudding user response：

This sed one-liner should do the job:

sed 's/^\(.\{14\}\)\([0-9]\{13\}[^0-9]\)/\1000\2/' file

This assumes serial numbers consist of decimal digits only and trusts that they all start at the column 15 of every line.

Or, an awk solution:

awk 'BEGIN { FS=OFS=":" } length($6) == 13 { $6 = "000" $6 } 1 ' file

This one only checks if the length of the sixth field is 13 and trusts that sixth field is the serial number field.

CodePudding user response：

One awk idea that replaces all of OP's current code:

awk '
BEGIN         { FS=OFS=":" }                # set input/output field delimiter to ":"
length($6)<16 { $6=sprintf("6d",$6) }    # if length of 6th field < 16 then left-pad the field with 0's to length of 16
1                                           # print current line
' file.txt

This generates:

1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

CodePudding user response：

I took the liberty to tack a : on ...

$ awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)":"}}1' file.txt 
1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

If that's not what you want, use this: awk '{if(length($2)<19){$2=gensub(/^(:.:)/,"\\1000","1",$2)}}1' file.txt

CodePudding user response：

Another alternative

awk -v{O,}FS=: '{$6=gensub(" ", "0", "g", sprintf("s", gensub(" ", "", "g", $6)))}1'

result

1:6822:26: :A:0000000000999993:DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994:MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995:CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

CodePudding user response：

Because your question it tagged with bash. As an object of study.

# init array arr
arr=();

# read current row with field separator : from file to array arr
while IFS=":" read -r -a arr rest; do

  # remove leading zeros to avoid problem with octal numbers in bash
  # and then pad leading zeros
  printf -v arr[5]  "6d" "${arr[5]## (0)}";

  # output array arr with field separator :
  for i in "${arr[@]}"; do
    printf '%s:' "$i";
  done;
  printf '\n';

done < file

Output:

1:6822:26: :A:0000000000999993: :DIS:14516E : :01: : : ::0529483733710: : :
1:6822:26: :A:0000000000999994: :MAT:13L324 : :01: : : :: : : :
1:6822:26: :A:0000000000999995: :CAT:P13WFB : :01: : : ::0529483697940: : :
1:6822:26: :3:0000000000170891: :AZDG-2 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170892: :AZDG-3 :0000003999:01:0000000000: : :: : : :
1:6822:26: :3:0000000000170893: :AZDG-4 :0000003999:01:0000000000: : :: : : :

The tool of choice is certainly awk.