Home > Back-end >  header columns validation in shell script
header columns validation in shell script

Time:10-28

"|" delimited files should have below column headers

Activity
Activity  ID
Description
Status

After upload, before start processing the file using SQLLDR, I make sure the uploaded file has exact number of headers, headers names are matching and in same order.

Code:

declare -i header=4
fields=( 
"Activity"
"Activity  ID"
"Description"
"Status"
)

for i in "Test File.csv"; do
    read -r line < "$i" 

    oldIFS="$IFS"
    IFS=$'|'
    fldarray=( $line );
    IFS="$oldIFS"

    nfields=${#fldarray[@]}     
    if (( nfields < header ))
    then
    printf "error: only '%d' fields in file '%s'\nmissing:" "$nfields" "$i"
    else        
        for item1 in "${header[@]}"; do
          for item2 in "${fields[@]}"; do
           if [[ $item1 != $item2 ]]; then
            Array3 =("$item1")
           fi
         done
        done
        echo "not matching" ${Array3[@]}
        printf "\n\n"
    fi
done

Data:

Activity|Activity  ID|Description|Status
Test|1234|First activity|Open

This is always printing that Activity column is missing, though it is present in the file. After I remove the " " from header and file uploaded, it is working as expected. How can I change the above code to validate column headers with " ". I referred the answer from bash to identify and verify file headers to build this solution

CodePudding user response:

NOTE: still a bit confused as to what OP wants to do (eg, header is defined as an integer but later referenced as an array ("${header[@]}"))

Assumptions:

  • print an error if the number of | delimited fields in the first row of the .csv file does not match the number of entries in the fields[] array
  • header fields from the .csv file must be an exact match (spelling and order) as the entries in the fields[] array
  • print the entries from the fields[] array that don't have an exact match with the | delimited fields from the first row of the .csv file

We'll keep the current fields[] array:

fields=("Activity" "Activity  ID" "Description" "Status")

The pull the first line of the .csv file into the headers[] array:

IFS='|' read -r -a headers < test.csv      # read first line from test.csv, break on '|' delimiter, store in headers[] array

Giving us:

$ typeset -p fields headers
declare -a fields=([0]="Activity" [1]="Activity  ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity  ID" [2]="Description" [3]="Status")

Now make some modifications to OP's if/else/for/fi code:

if [[ "${#fields[@]}" -ne "${#headers[@]}" ]]            # field count mismatch?
then
     echo "error: field count mismatch: expecting ${#fields[@]} / found ${#headers[@]}"
else
    Array3=()                                            # init array Array3[]

    for ((i=0;i<${#fields[@]};i  ))                      # loop through indices of fields[] array
    do
        [[ "${fields[$i]}" != "${headers[$i]}" ]] && \   # if same position in both arrays is not a match then ...
        Array3 =("${fields[$i]}")                        # add fields[] entry to Array3[]
    done

    [[ "${#Array3[@]}" -ne 0 ]] && \                     # if Array3[] not empty then ...
    echo "not matching:" ${Array3[@]}                    # print list of fields to stdout
fi

For this particular case, where ${fields[@]} and ${headers[@]} are identical, no output is generated.

Other test cases:

2nd field in headers[] is spelled differently

declare -a fields=([0]="Activity" [1]="Activity  ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity " [2]="Description" [3]="Status")

# the code generates:

not matching: Activity  ID

headers[] has 3 entries

declare -a fields=([0]="Activity" [1]="Activity  ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity  ID" [2]="Status")

# the code generates:

error: field count mismatch: expecting 4 / found 3

headers[] has 4 entries but all differ from corresponding entry in fields[]

declare -a fields=([0]="Activity" [1]="Activity  ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity  ID" [1]="Description" [2]="Status" [3]="Activity")

# the code generates:

not matching: Activity Activity  ID Description Status

From here OP should be able to tweak the code to provide the desired outputs and/or set some variables to use for follow-on conditional operations (eg, abort processing if either echo is triggered, disable follow-on processing if either echo is triggered, etc).

  • Related