Home > Blockchain >  How do I filter lines in a text file that start with a capital letter and end with a positive intege
How do I filter lines in a text file that start with a capital letter and end with a positive intege

Time:12-22

I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer. Is there a way to modify my command so that it does this all in one line with one call of grep instead of two? I am using windows subsystem for linux and the microsoft store ubuntu.

Text File:

C line 1
c line 2
B line 3
d line 4
E line five

The command that I have gotten to work:

grep ^[A-Z] cap*| grep [0-9]$ cap*

The Output

C line 1
B line 3

This works but i feel like the regex statement could be combined somehow but

grep ^[A-Z][0-9]$ 

does not yield the same result as the command above.

CodePudding user response:

You need to use

grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'

See the online demo. The regex matches:

  • ^ - start of string
  • [A-Z] / [[:upper:]] - an uppercase letter
  • .* - any zero or more chars ([^0-9]* matches zero or more non-digit chars)
  • [0-9] - a digit.
  • $ - end of string.

Also, if you want to make sure there is no - before the number at the end of string, you need to use a negated bracket expression, like

grep -E '^[[:upper:]](.*[^-0-9])?[1-9][0-9]*$'

Here, the POSIX ERE regx (due to -E option) matches

  • ^[[:upper:]] - an uppercase letter at the start and then
  • (.*[^-0-9])? - an optional occurrence of any text and then any char other than a digit and -
  • [1-9] - a non-zero digit
  • [0-9]* - zero or more digits
  • $ - end of string.

CodePudding user response:

When you use a pipeline, you want the second grep to act on standard input, not on the file you originally grepped from.

grep ^[A-Z] cap*| grep [0-9]$

However, you need to expand the second regex if you want to exclude negative numbers. Anyway, a better solution altogether might be to switch to Awk:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0' cap*

The output format will be slightly different than from grep; if you want to include the name of the matching file, you have to specify that separately:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0 { print FILENAME ":" $0 }' cap*

The regex ^[A-Z][0-9]$ matches exactly two characters, the first of which must be an alphabetic, and the second one has to be a number. If you want to permit arbitrary text between them, that would be ^[A-Z].*[0-9]$ (and for less arbitrary, use something a bit more specific than .*, like (.*[^-0-9])? perhaps, where you need grep -E for the parentheses and the question mark for optional, or backslashes before each of these for the BRE regex dialect you get out of the box with POSIX grep).

  • Related