Printing First Variable in Awk but Only If It's Less than X-CodePudding

I have a file with words and I need to print only the lines that are less than or equal to 4 characters but I'm having trouble with my code. There is other text on the end of the lines but I shortened it for here.

file:

John Doe 
Jane Doe
Mark Smith
Abigail Smith
Bill Adams

What I want to do is print the names that have less than 4 characters.

What I've tried:

awk '$1 <= 4 {print $1}' inputfile

What I'm hoping to get:

John 
Jane
Mark
Bill

So far, I've got nothing. Either it prints out everything, with no length restrictions or it doesn't even print anything at all. Could someone take a look at this and see what they think? Thanks

CodePudding user response：

First, let understand why

awk '$1 <= 4 {print $1}' inputfile

gives you whole inputfile, $1 <= 4 is numeric comparison, so this prompt GNU AWK to try to convert first column value to numeric value, but what is numeric value of say

John

? As GNU AWK manual Strings And Numbers put it

A string is converted to a number by interpreting any numeric prefix of the string as numerals(...)Strings that can’t be interpreted as valid numbers convert to zero.

Therefore numeric value for John from GNU AWK point of view is zero.

In order to get desired output you might use length function which returns number of characters as follows

awk 'length($1)<=4{print $1}' inputfile

or alternatively pattern matching from 0 to 4 characters that is

awk '$1~/^.{0,4}$/{print $1}' inputfile

where $1~ means check if 1st field match, . denotes any character, {0,4} from 0 to 4 repetitions, ^ begin of string, $ end of string (these 2 are required as otherwise it would also match longer string, as they do contain substring .{0,4})

Both codes for inputfile

John Doe 
Jane Doe
Mark Smith
Abigail Smith
Bill Adams

give output

John
Jane
Mark
Bill

(tested in gawk 4.2.1)