I want to validate a file which contains multiple lines in this format:
alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces
so basically, the line is pipe delimited and I need to check whether the number of pipes are equal to a variable say 10 for now. The number of pipes cannot be greater or less than 10 . Some words maybe empty string as well, such as "||||". I just need to validate the pipe count. What's inside doesn't matter.
What can be the regex for that? I am doing this using shell scripting on linux.
Also, this is just a single line. I have multiple lines in a single file(tens of thousands of records). What would be the best way to perform the validation? I have read about sed and other things, but I am not sure which one would be faster.
CodePudding user response:
Just counting pipes:
^([^|]*\|){10}[^|]*$
Enforcing values are alpha/space too:
^(?i)[a-z ]*\|){10}[a-z ]*$
CodePudding user response:
File input.txt
:
a b c|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b| 2 S
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b
The script could be:
#!/bin/bash
#
inputfile="input.txt"
if [[ ! -f "$inputfile" ]]
then
echo "The input file does not exist."
exit 1
else
while read -r line
do
echo "LINE=$line"
pipe_count=$(echo "$line" | awk -F'|' '{print NF-1}')
if [[ $pipe_count == 10 ]]
then
echo "OK, 10 |"
else
echo "NOT OK, only $pipe_count |"
fi
echo ""
done <"$inputfile"
fi