Home > Net >  Function to find directly part of a document in a file without many arguments
Function to find directly part of a document in a file without many arguments

Time:11-20

Is there any python string(like .findall ,.find etc) where directly can find what is wanted? For example if we want in an html file all the hyperlinks where is included the 'www' to give something like:

html.findall(www)

Of course the syntax is not right but one simple string without many arguments could help

CodePudding user response:

Here is a simple example that uses re module to find all websites that start with www.:

import re

string = """<a href="stackoverflow.com>Stack Overflow"</a>
<a href="github.com">Github</a>
<a href="www.google.com">Google</a>
<a href="www.madeupwebsite.com">Made Up</a>
<a href="pypi.org">PyPi</a>
"""

print(re.findall("(?!\")www.*(?=\")", string)) # Find all non-overlapping matches
  • Related