I am a wikipedia editor, and I have zero experience with regex I use wikipedia's auto wiki browser to do repetitive tasks. I often need to remove spam links from articles. Currently there is a website link which has been used in around 1500 articles. It needs to be removed from all the articles.
Wikipedia has many templates/methods for adding references in the article. But if I see one working regex, I can then update it for other methods as well. One of the referencing style is:
<ref name="pakrail.com">{{Cite web |url=http://www.pakrail.com/ybook2.pdf |title=PRINCIPAL STATISTICS |access-date=18 November 2016 |archive-url=https://web.archive.org/web/20170209175303/http://www.pakrail.com/ybook2.pdf |archive-date=9 February 2017 |url-status=dead |df=dmy-all }}</ref>
In this case, I want to find instance of www.pakrail.com
between <ref
and /ref>
tags, and then delete whatever is inside that particular ref tags. In short, it will remove the reference. But it shouldn't delete the other valid websites/reference inside other ref tags.
thanks a lot in advance.
CodePudding user response:
Use a negative lookahead to keep the match within the tag:
<ref ((?!/ref>).)*www.pakrail.com((?!/ref>).)*/ref>
See live demo.