I have the text file below:
data:<SupplierParty
data:xmlns="xxx">
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20750
data: </cbc:CustomerAssignedAccountID>
data: <cbc:AdditionalAccountID schemeID="cashflow:v1">151</cbc:AdditionalAccountID>
data:<SupplierParty
data:xmlns="xxx">
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20751
data: </cbc:CustomerAssignedAccountID>
data: <cbc:AdditionalAccountID schemeID="cashflow:v1">151</cbc:AdditionalAccountID>
data:<SupplierParty
data:xmlns="xxx">
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20752
data: </cbc:CustomerAssignedAccountID>
data: <cbc:AdditionalAccountID schemeID="cashflow:v1">151</cbc:AdditionalAccountID>
And I only want to extract the values:
20750
20751
20752
From the file.
The closest I got to was:
(?<=vendor-id"\>)(.*?)(?=\<\/cbc:CustomerAssignedAccountID)
But this extracts:
data: 20751
data:
I want digits only.
How do I do this?
CodePudding user response:
I dont know the language you are using but you can try the below regex
(data:\s*<cbc:.*?>\s*)data:\s*(\d )\s*(?=data:\s*</cbc:.*?>)
Below are the matches
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20750
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20751
data: <cbc:CustomerAssignedAccountID schemeID="vendor-id">
data: 20752
now the brackets ()
i have added to create group
(\d ) this group will give you the number which you need
now i dont know which language you are using but you can easily extract that number by using group
CodePudding user response:
I'd do it like this:
vendor-id">[^<]*?(\d )
The matches will be in matching group 1.
Important is the ?
after the [^<]*
so that it matches non-greedy.