Home > Software design >  Count the number of records in 1st column of a file using awk?
Count the number of records in 1st column of a file using awk?

Time:06-18

Is there a way to count the number of records in 1st column of a file using awk ??

My file :-

abc|87123
cdb|
fgytw|23321
ghft|
|87635

expected output: 4

I tried below command but its not working:

awk -F'|' 'NF==$1{c  }END {print c}' file

CodePudding user response:

You can use

awk -F\| 'length($1){c  } END{print c}'

See the online demo:

#!/bin/bash
s='abc|87123
cdb|
fgytw|23321
ghft|
|87635'
awk -F\| 'length($1){c  } END{print c}' <<< "$s"
# => 4

That is, the c is only incremented if Field 1 length is greater than zero.

CodePudding user response:

$ awk -F'|' '$1 != ""{c  } END{print c 0}' file
4

You need the 0 at the end to get numeric 0 output instead of a blank line when no lines match the condition.

CodePudding user response:

1st solution: With your shown samples, please try following awk code. Simple explanation would be, this will check if 1st field is NOT empty(not space) and having length then count that field and keep doing this for whole Input_file then in END block of awk code print that total number of matches found.

awk -F'|' '$1!~/[[:space:]]/ && length($1){count  } END{print count}' Input_file

NOTE: Also change from [[:space:]] to [[:blank:]] in case you may have spaces OR Tabs also in first columns.



2nd solution: Using GNU grep wc combination in this solution.

grep -oP '^\S \|' Input_file | wc -l


3rd solution: As per suggestion in comments by RARE kpoop Manifesto one could try following also.

awk -F'^[[:space:]]*[|]' '{ count  = NF == 1 } END { print count}' Input_file

CodePudding user response:

What about this:

echo $(( $(cat test.txt | wc -l) - $(grep "^|" test.txt | wc -l) ))

To give you an idea what it means:

cat test.txt | wc -l

This counts the amount of lines in the entire file. Don't use wc -l test.txt because this also outputs the name of the file, which you don't need.

grep "^|" test.txt | wc -l

That's a neat trick: ^ means "starting of line". When it gets followed by a column separator, then it means that the first column is not filled in. So, grep "^|" test.txt | wc -l gives the amount of lines where the first column is not filled in.

Now, how to combine both? Well, simply using $((4-1)), which performs an integer calculation.

I admit, it looks nasty, but it does the job! :-)

CodePudding user response:

Another awk solution:

awk '/^[^|]/{  c} END {print c}' file

4

CodePudding user response:

$ wc -l < <(sed '/^|/d' file)
4
$ sed '/^|/d' file|sed -n '$='
4
$ grep -c "^[^|]" file
4

CodePudding user response:

keep it simple - 3 ways of saying the same hting:

{m,g}awk '{ _ =    NF } END { print NR-_ NR }' FS='^[|]'
{m,g}awk '{ _ =!__~NF } END { print    _    }' FS='^[|]'
{m,g}awk '{ _ =/^\|/  } END { print NR-_    }' FS='^$'

4

If you don't mind to loading the file all at once, then even easier :

 - single subtraction   gsub()
 - no tracking needed
 - input rows become "fields" in this context

.

{m,g}awk '$!NF = NF - gsub("(^|\n)[|]|\n$","&")' FS='\n' RS='^$'

4

or if you wanna do it reversed order (admittedly, overkill for the task) : .

{m,g}awk '$!NF= gsub("[^|] ","&", $!(NF = NF))'   RS='^$' \
           OFS='|' FS='[|]([^|\n]*[|])*[^|\n]*\n' OFS='|'

4
  • Related