Hello I am new to the web scraping. I scrap a site but after I write it into the CSV only one block is filled with all the information and I want the information to be filled in row wise it will not matter if they are in single but they must be in different row. Here is the code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from csv import writer
url = 'https://virtualhs.pwcs.edu/about/faculty'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
Title = soup.find('div', id='divContent')
if Title:
for p in Title.select("p"):
p.extract()
for h2 in Title.select("h2"):
h2.extract()
Title =Title.text
print(Title)
with open('now.csv','w',encoding='utf-8', newline='') as f:
thewriter = writer(f)
thewriter.writerow([Title])
CodePudding user response:
It's not clear what you are trying to scrape. Is it the teachers and the department?
If you use 'w'
as your parameter, it will overwrite after each iteration. You would need to use 'a'
to append after each iteration, but need to also make sure you write an initial "blank" csv to append to.
Personally, I think it's just easier to construct a dataframe, then write that to file:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://virtualhs.pwcs.edu/about/faculty'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'html.parser')
departments = soup.find_all('h3')
rows = []
for department in departments:
for teacher in department.find_next('ul').find_all('li'):
row = {
'teacher':teacher.text,
'department':department.text}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('now.csv', index=False)