I have checked all the existing questions on Stackoverflow but I couldn't find the perfect answer to it and need your help.
So basically I have multiple Strings containing different formats of URL in different ways, for eg:-
1:
<p><a href='https://abcd.com/sites/WG-ProductManagementTeam/FunctionalSpecs/Forms/AllItems.aspx?id=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0.pdf&parent=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist&p=true&ga=1'>WG-Product Management Team - PA Peer Checklist (V2.3) -v10.0.pdf - All Documents (sharepoint.com)</a></p>
2:
https://abcd.com/sites/WG-ProductManagementTeam/FunctionalSpecs/Forms/AllItems.aspx?id=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0.pdf&parent=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist&p=true&ga=1
3:
https://abcd.com/:b:/r/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (v2.0) - v3.0.pdf?csf=1&web=1&e=txs2Yq
I want to extract a part of URL like this:- /DevDOC/....../.pdf
as you can see above shared 3 URL strings are all different but I am not able to find the most efficient way to resolve this.
I need to do it in such a way that it works for every type of URL string even though formats are different it should extract it from any and every String in same way.
Right now I am using regex: "./FunctionalSpecs(?!.\1)(.*?)(.pdf)" and it is working for URL 2 and 3 shared above but in case of URL 1 it is returning:
/DevDOC/Enhancements to PA Peer Checklist&p=true&ga=1'>WG-Product Management Team - PA Peer Checklist (V2.3) -v10.0.pdf
which is incorrect, I wanted this:
/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0.pdf
Please help me resolve this as soon as possible as It seems so easy but I am not able to do it in an efficient way.
Also, I am trying to do it in Java.
Any help is highly appreciated. Thank you.
CodePudding user response:
You can either decode and then use:
`/DevDOC/[^\.] \.pdf`
Or without decoding you might want to use:
DevDoc[^\.] pdf
I'm relying here on the existence of a period before the .pdf
, as the regex should keep going until first appearance of a period. If that doesn't work you might want to use [^"]
.
CodePudding user response:
you can use decodeURIComponent to decode your url and then you can extract your value like below.
var url = decodeURIComponent("your encoded url string");
console.log(url.match(/DevDOC[\s\S]*\.pdf/i));