Home > other >  How to select all links of apps from app store and extract its href?
How to select all links of apps from app store and extract its href?

Time:05-29

from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen

url = f'https://www.apple.com/kr/search/youtube?src=globalnav'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.select(".rf-serp-productname-list")
print(links)

I want to crawl through all links of shown apps. When I searched for a keyword, I thought links = soup.select(".rf-serp-productname-list") would work, but links list is empty.

What should I do?

CodePudding user response:

Just check this code, I think is what you want:

import re
import requests
from bs4 import BeautifulSoup

pages = set()

def get_links(page_url):
  global pages
  pattern = re.compile("^(/)")
  html = requests.get(f"your_URL{page_url}").text # fstrings require Python 3.6 
  soup = BeautifulSoup(html, "html.parser")
  for link in soup.find_all("a", href=pattern):
    if "href" in link.attrs:
      if link.attrs["href"] not in pages:
        new_page = link.attrs["href"]
        print(new_page)
        pages.add(new_page)
        get_links(new_page)
        
get_links("")

Source: https://gist.github.com/AO8/f721b6736c8a4805e99e377e72d3edbf

You can change the part:

for link in soup.find_all("a", href=pattern):
     #do something

To check for a keyword I think

CodePudding user response:

You are cooking a soup so first at all taste it and check if everything you expect contains in it.

ResultSet of your selection is empty cause structure in response differs a bit from your expected one from the developer tools.

To get the list of links select more specific:

links = [a.get('href') for a in soup.select('a.icon')]  

Output:

['https://apps.apple.com/kr/app/youtube/id544007664', 'https://apps.apple.com/kr/app/쿠팡플레이/id1536885649', 'https://apps.apple.com/kr/app/youtube-music/id1017492454', 'https://apps.apple.com/kr/app/instagram/id389801252', 'https://apps.apple.com/kr/app/youtube-kids/id936971630', 'https://apps.apple.com/kr/app/youtube-studio/id888530356', 'https://apps.apple.com/kr/app/google-chrome/id535886823', 'https://apps.apple.com/kr/app/tiktok-틱톡/id1235601864', 'https://apps.apple.com/kr/app/google/id284815942']
  • Related