Extract value from loc output in bash-CodePudding

Using loc I am able to get statistics about the number of lines in the cwd:

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 TypeScript             108        18640         2049         1717        14874
 JSON                     3        13293            0            0        13293
 Markdown                 8          725          183            0          542
 HTML                     1           14            3            0           11
 JavaScript               2           12            1            3            8
--------------------------------------------------------------------------------
 Total                  122        32684         2236         1720        28728
--------------------------------------------------------------------------------

I want to write a bash script to get the count of the TypeScript code in the project (14874 in this example). The regexp TypeScript.*?(\d )$ matches the value I want but I am having difficulties using it.

I have tried two methods so far:

grep: Can get the whole matched line BUT cannot get the digit group
bash's =~: Can get groups BUT cannot use $ to match the end of the line

What is best way to extract this value?

CodePudding user response：

You could use awk to find the TypeScript line and print the 6th column:

loc ... | awk '$1=="TypeScript" {print $6}'

CodePudding user response：

get the count of the TypeScript code
[...]
The regexp TypeScript.*?(\d )$ matches the value I want

sed can extract groups from regexes using \1, \2, … in s/regex/replacement/. However, sed (just like posix grep and bash's [[ =~ ]]) do not support PCRE constructs like \d and .*?. But we actually don't need them here.

You might want to let loc count only TypeScript files to safe lots of unnecessary work.

loc --include '\.ts$' | sed -n 's/^ *TypeScript.* //p'

Due to the include you could even simplify the parsing to one of the following commands (even though they are a bit more cryptic):

grep -Em1 '[0-9] $'
sed -n '4s/.* //p'
awk 'NR==4 {print $(NF-1)}'