Home > Enterprise >  regex for counting/validating the pipe
regex for counting/validating the pipe

Time:10-02

I want to validate a file which contains multiple lines in this format:

alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces

so basically, the line is pipe delimited and I need to check whether the number of pipes are equal to a variable say 10 for now. The number of pipes cannot be greater or less than 10 . Some words maybe empty string as well, such as "||||". I just need to validate the pipe count. What's inside doesn't matter.

What can be the regex for that? I am doing this using shell scripting on linux.

Also, this is just a single line. I have multiple lines in a single file(tens of thousands of records). What would be the best way to perform the validation? I have read about sed and other things, but I am not sure which one would be faster.

CodePudding user response:

Just counting pipes:

^([^|]*\|){10}[^|]*$

Enforcing values are alpha/space too:

^(?i)[a-z ]*\|){10}[a-z ]*$

CodePudding user response:

File input.txt:

a b c|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b| 2 S
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b

The script could be:

#!/bin/bash
#
inputfile="input.txt"

if [[ ! -f "$inputfile" ]]
then
    echo "The input file does not exist."
    exit 1
else
    while read -r line
    do
        echo "LINE=$line"
        pipe_count=$(echo "$line" | awk -F'|' '{print NF-1}')
        if [[ $pipe_count == 10 ]]
        then
            echo "OK, 10 |"
        else
            echo "NOT OK, only $pipe_count |"
        fi
        echo ""
    done <"$inputfile"
fi
  • Related