Rewriting a string and removing unwanted elements in python-CodePudding

I use the python library Nvdlib which aims to extract information from Nist. Among these informations, I'm interested in the CPE and especially the api output. Here is my code :

import nvdlib
r = nvdlib.searchCVE(cveId='CVE-2019-19781')[0]

conf = r.configurations #list in ouput

for x in conf:
     txt = ', '.join(str(x) for x in x.nodes) #transforme list to string
     print(x)

output :

{'operator': 'AND', 'negate': False, 'nodes': [{'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:application_delivery_controller_firmware:10.5:*:*:*:*:*:*:*', 'matchCriteriaId': 'D56F2AAF-4658-484C-9A3A-D8A52BA5B10C'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:application_delivery_controller_firmware:11.1:*:*:*:*:*:*:*', 'matchCriteriaId': '8CE9E655-0D97-4DCF-AC2F-79DCD12770E5'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:application_delivery_controller_firmware:12.0:*:*:*:*:*:*:*', 'matchCriteriaId': '49454F7D-77B5-46DF-B95C-312AF2E68EAD'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:application_delivery_controller_firmware:12.1:*:*:*:*:*:*:*', 'matchCriteriaId': '201246D4-1E22-4F28-9683-D6A9FD0F7A6B'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:application_delivery_controller_firmware:13.0:*:*:*:*:*:*:*', 'matchCriteriaId': 'A3A50966-5554-4919-B6CE-BD8F6FF991D8'}]}, {'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': False, 'criteria': 'cpe:2.3:h:citrix:application_delivery_controller:-:*:*:*:*:*:*:*', 'matchCriteriaId': '80E69E10-6F40-4FE4-9D84-F6C25EAB79D8'}]}]}

{'operator': 'AND', 'negate': False, 'nodes': [{'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:netscaler_gateway_firmware:10.5:*:*:*:*:*:*:*', 'matchCriteriaId': '7E0FA8E2-3E8F-481E-8C39-FB00A9739DFC'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:netscaler_gateway_firmware:11.1:*:*:*:*:*:*:*', 'matchCriteriaId': 'A5D73B9A-59AA-4A38-AEAF-7EAB0965CD7E'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:netscaler_gateway_firmware:12.0:*:*:*:*:*:*:*', 'matchCriteriaId': 'B9F3ED0E-7F3D-477B-B645-77DA5FC7F502'}, {'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:netscaler_gateway_firmware:12.1:*:*:*:*:*:*:*', 'matchCriteriaId': '58349F8E-3177-413A-9CBE-BB454DCD31E4'}]}, {'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': False, 'criteria': 'cpe:2.3:h:citrix:netscaler_gateway:-:*:*:*:*:*:*:*', 'matchCriteriaId': 'DEBB9B6A-1CAD-4D82-9B1E-939921986053'}]}]}

{'operator': 'AND', 'negate': False, 'nodes': [{'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': True, 'criteria': 'cpe:2.3:o:citrix:gateway_firmware:13.0:*:*:*:*:*:*:*', 'matchCriteriaId': 'A80EAFB1-82DA-49BE-815D-D248624B442C'}]}, {'operator': 'OR', 'negate': False, 'cpeMatch': [{'vulnerable': False, 'criteria': 'cpe:2.3:h:citrix:gateway:-:*:*:*:*:*:*:*', 'matchCriteriaId': '3EF98B43-71DB-4230-B7AC-76EC2B1F0533'}]}]}

My procedure : I get the information, I transfer the output from "list" to string (I don't know if it's the best way) with the code above.

Then I delete the useless elements with a variable "to_delet_char = ["''", '""', "{" ,"}", "vulnerable", ": True, 'criteria': ", ", : ", "'", "]", ",", "OR negate:", "operator:", "False", "cpeMatch:", "[", "]", ]

And my goal would be to remove all the information other than "cpe" present in the outputs to have a result in the form of "list" or "dictionary" in which I will find only this kind of elements:

"cpe:2.3:o:citrix:netscaler_gateway_firmware:12.0::::::"

I manage without difficulty to delete everything, however the Match serial ID being different each time I can't target it.

Would there be a solution via another library or not to "recover only" the cpe or to delete everything except the "cpe" and then transform them into a list or dictionary for the purpose of a database entry

CodePudding user response：

I think it's quite hard to delete everything in the string because you can't foresee what's going to be inside the string in future. But then you can spot the pattern which is to find for the cpe substring.

Simply just add this, for every loop you look for the substring and then do some slicing, splitting and also trimming to get your final output.

nIndex = x.find('cpe')

print ((x[nIndex:].split())[0][:-2])

just an example with one line, it will give you the output below,

What I've shown you above is only for a single cpe substring within one iteration. Possibly you will have to find for a few within that iteration. You can refer to this good example on how to retrieve multiple index > https://stackoverflow.com/a/3873422/12128167

For the storing of data, the simplest way is to use a list which is simply done by declaring an empty list=[], followed by appending it at the end list.append("your output"). You can explore other python collections if you intend to use them > https://www.w3schools.com/python/python_dictionaries.asp

CodePudding user response：

Seems like all you are looking for is to check, whether a string starts with "cpe:". In that case you can make use of the strings startswith property like shown in the following:

import nvdlib
r = nvdlib.searchCVE(cveId='CVE-2019-19781')[0]

conf = r.configurations #list in ouput

for x in conf:
     txt = ', '.join(str(xx) for xx in x.nodes if str(xx).startswith("cpe:") #transforme list to string
     print(txt)

Note, that I separated xx and x.