"|" delimited files should have below column headers
Activity
Activity ID
Description
Status
After upload, before start processing the file using SQLLDR, I make sure the uploaded file has exact number of headers, headers names are matching and in same order.
Code:
declare -i header=4
fields=(
"Activity"
"Activity ID"
"Description"
"Status"
)
for i in "Test File.csv"; do
read -r line < "$i"
oldIFS="$IFS"
IFS=$'|'
fldarray=( $line );
IFS="$oldIFS"
nfields=${#fldarray[@]}
if (( nfields < header ))
then
printf "error: only '%d' fields in file '%s'\nmissing:" "$nfields" "$i"
else
for item1 in "${header[@]}"; do
for item2 in "${fields[@]}"; do
if [[ $item1 != $item2 ]]; then
Array3 =("$item1")
fi
done
done
echo "not matching" ${Array3[@]}
printf "\n\n"
fi
done
Data:
Activity|Activity ID|Description|Status
Test|1234|First activity|Open
This is always printing that Activity column is missing, though it is present in the file. After I remove the " " from header and file uploaded, it is working as expected. How can I change the above code to validate column headers with " ". I referred the answer from bash to identify and verify file headers to build this solution
CodePudding user response:
NOTE: still a bit confused as to what OP wants to do (eg, header
is defined as an integer but later referenced as an array ("${header[@]}"
))
Assumptions:
- print an error if the number of
|
delimited fields in the first row of the.csv
file does not match the number of entries in thefields[]
array - header fields from the
.csv
file must be an exact match (spelling and order) as the entries in thefields[]
array - print the entries from the
fields[]
array that don't have an exact match with the|
delimited fields from the first row of the.csv
file
We'll keep the current fields[]
array:
fields=("Activity" "Activity ID" "Description" "Status")
The pull the first line of the .csv
file into the headers[]
array:
IFS='|' read -r -a headers < test.csv # read first line from test.csv, break on '|' delimiter, store in headers[] array
Giving us:
$ typeset -p fields headers
declare -a fields=([0]="Activity" [1]="Activity ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity ID" [2]="Description" [3]="Status")
Now make some modifications to OP's if/else/for/fi
code:
if [[ "${#fields[@]}" -ne "${#headers[@]}" ]] # field count mismatch?
then
echo "error: field count mismatch: expecting ${#fields[@]} / found ${#headers[@]}"
else
Array3=() # init array Array3[]
for ((i=0;i<${#fields[@]};i )) # loop through indices of fields[] array
do
[[ "${fields[$i]}" != "${headers[$i]}" ]] && \ # if same position in both arrays is not a match then ...
Array3 =("${fields[$i]}") # add fields[] entry to Array3[]
done
[[ "${#Array3[@]}" -ne 0 ]] && \ # if Array3[] not empty then ...
echo "not matching:" ${Array3[@]} # print list of fields to stdout
fi
For this particular case, where ${fields[@]}
and ${headers[@]}
are identical, no output is generated.
Other test cases:
2nd field in headers[] is spelled differently
declare -a fields=([0]="Activity" [1]="Activity ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity " [2]="Description" [3]="Status")
# the code generates:
not matching: Activity ID
headers[] has 3 entries
declare -a fields=([0]="Activity" [1]="Activity ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity" [1]="Activity ID" [2]="Status")
# the code generates:
error: field count mismatch: expecting 4 / found 3
headers[] has 4 entries but all differ from corresponding entry in fields[]
declare -a fields=([0]="Activity" [1]="Activity ID" [2]="Description" [3]="Status")
declare -a headers=([0]="Activity ID" [1]="Description" [2]="Status" [3]="Activity")
# the code generates:
not matching: Activity Activity ID Description Status
From here OP should be able to tweak the code to provide the desired outputs and/or set some variables to use for follow-on conditional operations (eg, abort processing if either echo
is triggered, disable follow-on processing if either echo
is triggered, etc).