How to use GREP or AWK to extract only fields of a matching line-CodePudding

I want to extract only the CVE-nnnn-nnnn numbers matching REGEX - CVE-[0-9]{4}-[0-9]{4,19} from a text file. The presentation of the CVE is different in each line, due to the merging of data sets.

cat text.txt
Security updates available for Adobe Flash Player (CVE-2014-8439)
CVE-2016-4449: SUSE Linux Security Advisory
Security vulnerabilities fixed in Firefox 66:CVE-2019-9799

I can match the lines using egrep 'CVE-[0-9]{4}-[0-9]{4,19}' test.txt

How do I extract ONLY the CVE number to get the list using grep or AWK?

CVE-2014-8439
CVE-2016-4449
CVE-2019-9799

CodePudding user response：

Adding some extra lines (multiple CVE, variable number of digits in last tuple):

$ cat test.txt
Security updates available for Adobe Flash Player (CVE-2014-8439)
CVE-2016-4449: SUSE Linux Security Advisory
Security vulnerabilities fixed in Firefox 66:CVE-2019-9799

CVE-7777-1234: SUSE fixed in Firefox 66:CVE-7777-567890
CVE-8888-14: SUSE fixed in Firefox 66:CVE-8888-090923487

One awk idea:

awk '
BEGIN { regex="CVE-[0-9]{4}-[0-9]{4,19}" }
      { while (match($0,regex)) {
              print substr($0,RSTART,RLENGTH)
              $0=substr($0,RSTART RLENGTH)
        }
      }
' test.txt

This generates:

CVE-2014-8439
CVE-2016-4449
CVE-2019-9799
CVE-7777-1234
CVE-7777-567890
CVE-8888-090923487

CodePudding user response：

I think this is what you need:

cat file|grep -Po 'CVE-[0-9]{4}-[0-9] '
CVE-2014-8439
CVE-2016-4449
CVE-2019-9799

or:

grep -Po 'CVE-[0-9]{4}-[0-9] ' file

grep's -o = only matched pattern, and -P is for perl-regex