Home > Software design >  How to extract data using beautiful soup
How to extract data using beautiful soup

Time:11-11

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://locations.atipt.com/'
headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://locations.atipt.com/al')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('ul',class_='list-unstyled')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=baseurl link['href']
        productlinks.append(comp)

for link in productlinks:
    r =requests.get(link,headers=headers)
    soup=BeautifulSoup(r.content, 'html.parser')
    tag=soup.find_all('div',class_='listing content-card')
    for pro in tag:
        tup=pro.find('a',class_='name').find_all('p')
        for i in tup:
            print(i.get_text())

I am trying to extract data but they will provide me nothing I try to extract data from the p tagthese is the page in which I try to extract data from p tag check it https://locations.atipt.com/al/alabaster

CodePudding user response:

The working solution so far using css selectors to get data from p tags as follows:

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl = 'https://locations.atipt.com/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r = requests.get('https://locations.atipt.com/al')
soup = BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('ul', class_='list-unstyled')
productlinks = []
for links in tra:
    for link in links.find_all('a', href=True):
        comp = baseurl link['href']
        productlinks.append(comp)

for link in productlinks:
    r = requests.get(link, headers=headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    tag = ''.join([x.get_text(strip=True).replace('\xa0','') for x in soup.select('div.listing.content-card div:nth-child(2)>p')])
    print(tag)

Output:

634 1st Street NSte 100Alabaster, AL35007
9256 Parkway ESte ABirmingham, AL352061940 28th Ave SBirmingham, AL352095431 Patrick WaySte 101Birmingham, AL35235833 St. Vincent's DrSte 100Birmingham, AL352051401 Doug Baker BlvdSte 104Birmingham, AL35242
1877 Cherokee Ave SWCullman, AL350551301-A Bridge Creek Dr NECullman, AL35055
1821 Beltline Rd SWSte BDecatur, AL35601
4825 Montgomery HwySte 103Dothan, AL36303
550 Fieldstown RdGardendale, AL35071323 Fieldstown Rd, Ste 105Gardendale, AL35071
2804 John Hawkins PkwySte 104Hoover, AL35244
700 Pelham Rd NorthJacksonville, AL36265
1811 Hwy 78 ESte 108 & 109Jasper, AL35501-4081
76359 AL-77Ste CLincoln, AL35096
1 College DriveStation #14Livingston, AL35470
106 6th Street SouthSte AOneonta, AL35121-1823
50 Commons WaySte DOxford, AL36203
301 Huntley PkwyPelham, AL35124
41 Eminence WaySte BPell City, AL35128
124 W Grand AveSte A-4Rainbow City, AL35906
1147 US-231Ste 9 & 10Troy, AL36081
7201 Happy Hollow RdTrussville, AL35173
100 Rice Mine Road LoopSte 102Tuscaloosa, AL354061451 Dr. Edward Hillard DrSte 130Tuscaloosa, AL35401
3735 Corporate Woods DrSte 109Vestavia, AL35242-2296
636 Montgomery HwyVestavia Hills, AL352161539 Montgomery HwySte 111Vestavia Hills, AL35216
  • Related