Extract year, number from a column-CodePudding

Can you craft a Postgres sql query that will create a seperate column that will extract from an other column. The extraction needs to be a CVE title. Aka CVE- yyyy - xxxxx year will change and the x’s will change for the CVE. Sometimes the CVE in the other column will be in “()” sometimes it will end with “:”

We need to make sure when extracting the year(yyyy) can change, and the x's(a sequence of numbers)can vary could be 1 to 6.

The column is like

the vulnerability name (CVE-2019-0215) 
the vulnerability name (CVE-2019-0290) extra words 
the vulnerability name CVE-2018-23314: blah blah

Expected Output in a new column:

CVE-2019-0215
CVE-2019-0290
CVE 2018-23314

CodePudding user response：

If this is about the standard CVE format, extracting the year can easily be done using substring() with a regular expression:

substring('CVE-2022-1552' from 'CVE-([0-9]{4})-[0-9] ')

returns 2022

substring() will return the first matching group, so the regex "describes" the whole pattern and by using a capturing group for the year, only that will be returned.

If you need to match other formats, you need to adjust the regex accordingly.

CodePudding user response：

It is useful to layout the problem through sample data, and the expected result (ps: for this please learn some stackoverflow format capability).

sample data

CREATE TABLE mytable(
   sourcecol VARCHAR(100) NOT NULL
);
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0215)');
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0290) extra words');
INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name CVE-2018-23314: blah blah');

query: using regex pattern in substring function

select substring(sourcecol from '(CVE-[0-9]{1,6}-[0-9]{1,6}) ')
from mytable

this regex "pattern" looks for the string starting with "CVE-" followed by 1 to 6 digits followed by "-" followed by 1 to 6 digits

result

 ---------------- 
|   substring    |
 ---------------- 
| CVE-2019-0215  |
| CVE-2019-0290  |
| CVE-2018-23314 |
 ----------------

see this dbfiddle