Home > OS >  Extract value from loc output in bash
Extract value from loc output in bash

Time:10-03

Using loc I am able to get statistics about the number of lines in the cwd:

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 TypeScript             108        18640         2049         1717        14874
 JSON                     3        13293            0            0        13293
 Markdown                 8          725          183            0          542
 HTML                     1           14            3            0           11
 JavaScript               2           12            1            3            8
--------------------------------------------------------------------------------
 Total                  122        32684         2236         1720        28728
--------------------------------------------------------------------------------

I want to write a bash script to get the count of the TypeScript code in the project (14874 in this example). The regexp TypeScript.*?(\d )$ matches the value I want but I am having difficulties using it.

I have tried two methods so far:

  • grep: Can get the whole matched line BUT cannot get the digit group
  • bash's =~: Can get groups BUT cannot use $ to match the end of the line

What is best way to extract this value?

CodePudding user response:

You could use awk to find the TypeScript line and print the 6th column:

loc ... | awk '$1=="TypeScript" {print $6}'

CodePudding user response:

get the count of the TypeScript code
[...]
The regexp TypeScript.*?(\d )$ matches the value I want

sed can extract groups from regexes using \1, \2, … in s/regex/replacement/. However, sed (just like posix grep and bash's [[ =~ ]]) do not support PCRE constructs like \d and .*?. But we actually don't need them here.

You might want to let loc count only TypeScript files to safe lots of unnecessary work.

loc --include '\.ts$' | sed -n 's/^ *TypeScript.* //p'

Due to the include you could even simplify the parsing to one of the following commands (even though they are a bit more cryptic):

  • grep -Em1 '[0-9] $'
  • sed -n '4s/.* //p'
  • awk 'NR==4 {print $(NF-1)}'
  •  Tags:  
  • bash
  • Related