I'm trying to print lines from a file if, and only if, every character in a line meets a certain regex condition.
The problem is that any line that contains any character that meets the regex condition evaluates to true and gets printed, even if it also contains characters outside that range.
I'd prefer to use awk
as I already have additional conditions in place that I would like the evaluated line to meet, and would prefer the solution to implement basic regex so I can apply different matching conditions on future files (whereas the grep
solution shown here focuses on non-ASCII identification and seems to require --perl-regexp
compatibility -- my focus is on meeting a given regex condition across an entire given line).
In the example, uppercase letters fall outside the regex condition and therefore the whole line where they appear should be ignored.
file.txt:
abc123
123abc
123ABC
AbCdEf
When I try...
awk '$0 ~ /[a-z]/ || $0 ~ /[0-9]/' < file.txt
...every line is printed, since the regex condition is met at least once in each line:
abc123
123abc
123ABC
AbCdEf
What I want is to not print a line if any character outside the [a-z]
and/or [0-9]
range is present, so the desired output here would be:
abc123
123abc
The closest hits I could find when researching this are here and here, but I don't want to search-and-replace anything on the line, I just want to ignore the line and move on to the next one if any unwanted characters are present.
CodePudding user response:
For the given sample input/output, try these:
$ awk '!/[^a-z0-9]/' ip.txt
abc123
123abc
$ grep -v '[^a-z0-9]' ip.txt
abc123
123abc
[^set]
means match any character excepts
ore
ort
- in other words,^
at the beginning of the class inverts the characters to be matched!
and-v
are used to print lines that do not match the given condition
The above solutions will match empty lines as well. To avoid that, you can use:
awk '/^[a-z0-9] $/' ip.txt
grep -xE '[a-z0-9] ' ip.txt