I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer. Is there a way to modify my command so that it does this all in one line with one call of grep instead of two? I am using windows subsystem for linux and the microsoft store ubuntu.
Text File:
C line 1
c line 2
B line 3
d line 4
E line five
The command that I have gotten to work:
grep ^[A-Z] cap*| grep [0-9]$ cap*
The Output
C line 1
B line 3
This works but i feel like the regex statement could be combined somehow but
grep ^[A-Z][0-9]$
does not yield the same result as the command above.
CodePudding user response:
You need to use
grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'
See the online demo. The regex matches:
^
- start of string[A-Z]
/[[:upper:]]
- an uppercase letter.*
- any zero or more chars ([^0-9]*
matches zero or more non-digit chars)[0-9]
- a digit.$
- end of string.
Also, if you want to make sure there is no -
before the number at the end of string, you need to use a negated bracket expression, like
grep -E '^[[:upper:]](.*[^-0-9])?[1-9][0-9]*$'
Here, the POSIX ERE regx (due to -E
option) matches
^[[:upper:]]
- an uppercase letter at the start and then(.*[^-0-9])?
- an optional occurrence of any text and then any char other than a digit and-
[1-9]
- a non-zero digit[0-9]*
- zero or more digits$
- end of string.
CodePudding user response:
When you use a pipeline, you want the second grep
to act on standard input, not on the file you originally grepped from.
grep ^[A-Z] cap*| grep [0-9]$
However, you need to expand the second regex if you want to exclude negative numbers. Anyway, a better solution altogether might be to switch to Awk:
awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0' cap*
The output format will be slightly different than from grep
; if you want to include the name of the matching file, you have to specify that separately:
awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0 { print FILENAME ":" $0 }' cap*
The regex ^[A-Z][0-9]$
matches exactly two characters, the first of which must be an alphabetic, and the second one has to be a number. If you want to permit arbitrary text between them, that would be ^[A-Z].*[0-9]$
(and for less arbitrary, use something a bit more specific than .*
, like (.*[^-0-9])?
perhaps, where you need grep -E
for the parentheses and the question mark for optional, or backslashes before each of these for the BRE regex dialect you get out of the box with POSIX grep
).