Home > other >  Search pattern from string
Search pattern from string

Time:10-04

I'm doing web scraping with BeautifulSoup and I need to get a link which is in a script tag, so I use this

soup.find(string=re.compile("https://link9876.net/index.php"))

this returns me the next string

"var link = [];
 link[0] = 'https://link1225.com/x/xxxxxx';
 link[1] = 'https://link9876.net/index.php?xxxxxxxxx';
 link[2] = 'https://link1356.com/index.php?xxxxxxxxx';
 ..."

(the position and number of the elements in the array changes every time)

But I only want to get "*https://link9876.net/index.php*", which is the best approach to resolve this?

CodePudding user response:

You could just use another regular expression to extract any necessary links, for example:

import re

script_text = """var link = [];
 link[0] = 'https://link1225.com/x/xxxxxx';
 link[1] = 'https://link9876.net/index.php?xxxxxxxx1';
 link[2] = 'https://link9876.net/index.php?xxxxxxxx2';
 link[3] = 'https://link9876.net/index.php?xxxx3xxx';
 link[4] = 'https://link1356.com/index.php?xxxxx4xxx';
 link[5] = 'https://link1356.com/index.php?xxxxx4xxx';
 ..."""
 
for link in re.findall(r"'(https://link9876\.net/index\.php.*?)'", script_text):
    print(link)

Would give you:

https://link9876.net/index.php?xxxxxxxx1
https://link9876.net/index.php?xxxxxxxx2
https://link9876.net/index.php?xxxx3xxx
  • Related