Home > Software engineering >  Unix Regex (ERE) for finding a lines with 4 or more numbers
Unix Regex (ERE) for finding a lines with 4 or more numbers

Time:02-11

I've tried so many variations but none seem to work, I have no idea what I don't understand.

The last one I've tried is (\<[[:digit:]] \>.*){4,}. But it doesn't even find lines like 123 123 123 123. The input lines can be anything (even like hello 123 my 1 name 2 is 3).

I didn't specify it, sorry, but what I meant is: line "123 a2" has 1 number, line "1 2 3 45" has 4 numbers.

CodePudding user response:

With your shown samples, please try following awk code. We need NOT to use loop here. This awk code is written and tested in GNU awk.

awk -v FPAT='(^|[[:space:]] )[0-9] ([[:space:]] |$)' 'NF>3' Input_file

Explanation: Adding detailed explanation for above.

  • Using GNU awk's option named FPAT to allow regex to make field separators.
  • Using regex (^|[[:space:]] )[0-9] ([[:space:]] |$) to match either starting spaces followed by digits OR digits followed by spaces or ending of line.
  • In main awk program checking condition NF>3 which means if number of fields are greater than 3 then print that line.

CodePudding user response:

It seems there is some problem with quantifying the pattern in your grep command. If the first number has more than one digits, and the others contain just one, your regex works. Else, it won't, see the testing results:

#!/bin/bash
s="123 1 2 3  - works, the first number is two  digits, the rest are one digits
1 2 3 45 - does not work, the first number is one-digit
14 2 3 4 - works
123 a2 - not output as expected as there is just one number
12 and one 1 more and 23 and 6 and some more 3546
hello 123 my 1 name 2 is 3 - output fine since the first number is three-digit and the rest are one digit"
 
grep -E '(\<[[:digit:]] \>.*){4,}' <<< "$s"

Output:

123 1 2 3  - works, the first number is two  digits, the rest are one digits
14 2 3 4 - works
hello 123 my 1 name 2 is 3 - output fine since the first number is three-digit and the rest are one digit

If you rewrite it without {4,} it works:

grep -E '\<[0-9] \>.*\<[0-9] \>.*\<[0-9] \>.*\<[0-9] \>' file

See this online demo.

You can use this awk in any environment, too:

awk '{cnt=0; for (i=1; i<=NF;   i) { if ($i ~ /^[0-9] $/) { cnt   } } }cnt>3' file

See the online demo. Details:

  • cnt=0 - set the cnt counter variable to 0
  • for (i=1; i<=NF; i) {...} - iterate over all fields in the current record (=line)
  • if ($i ~ /^[0-9] $/) { cnt } } - if the field is made of digits increment cnt
  • cnt>3 - if cnt is larger than 3, print the found record.

See the online demo:

#!/bin/bash
s="123  123  123  123 
1 2 3 45
1 3 5
hello 123 my 1 name 2 is 3" 

awk '{cnt=0; for (i=1; i<=NF;   i) { if ($i ~ /^[0-9] $/) { cnt   } } }cnt>3' <<< "$s"

Output:

123  123  123  123 
1 2 3 45
hello 123 my 1 name 2 is 3
  • Related