Home > Software engineering >  Want to Scrap each category individual but either it scraping data in single alphabet form or in a p
Want to Scrap each category individual but either it scraping data in single alphabet form or in a p

Time:09-30

I want to extract Name & Position, Education, Contact number and email all in different column of csv but when I extract it either it is a single block per alphabet or a single column per paragraph(if I list it).Here is the code:

import requests
from bs4 import BeautifulSoup
from csv import writer

url = 'https://governors.pwcs.edu/about_us/staff_bios_and_contact_information'
req = requests.get(url)

soup = BeautifulSoup(req.text, 'lxml')
page = soup.find_all('p')

for i in page:
   i = i.text
   with open('page.csv', 'a', encoding = 'utf8', newline='') as f:
       thewriter = writer(f)
       thewriter.writerow(i)

CodePudding user response:

You can use regex to pull out what you need:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

url = 'https://governors.pwcs.edu/about_us/staff_bios_and_contact_information'
req = requests.get(url)

soup = BeautifulSoup(req.text, 'html.parser')
content = soup.find('div', {'id':'divContent'})
p_list = content.find_all('p')

rows = []
for p in p_list:
    string = p.text
    text = re.search('(^.*) (Education: )(.*)( Contact).*(\d{3}-\d{3}-\d{4})\s*([a-zA-z1-9].*@[\w].*\.[\w].*)', string).groups()
    
    name = text[0]
    edu = text[2]
    phone = text[4]
    email = text[5]
    
    row = {
        'name':name,
        'education':edu,
        'phone':phone,
        'email':email}
    
    rows.append(row)
    
df = pd.DataFrame(rows)
df.to_csv('page.csv', index=False)
  • Related