url = "https://www.volvogroup.com/en/news-and-media/press-releases.html"
source = requests.get(url)
soup = BeautifulSoup(source.text , "html.parser")
for i in soup.find_all('p' ):
for j in i.find_all('a'):
href = j.get('href')
print(href)
I am able to fetch the link here in href . But when i am creating a dataframe like this using list comprehensions not able to get the same output in dataframe
check = soup.find_all('p' , class_ = "articlelist__headerTitle")
for i in range(len(check)):
df.loc[i , 'company_id'] = 'Volvo_AB'
df.loc[i, 'links'] = [ i.a.get('href')for i in soup.find_all('p')]
print(df)
CodePudding user response:
There are few objects in the list who doesn't have a
attribute so you get None
values. So you need to filter those out. Try the below code.
check = soup.find_all('p' , class_ = "articlelist__headerTitle")
for i in range(len(check)):
df.loc[i , 'company_id'] = 'Volvo_AB'
values = [i.a.get('href') for i in soup.find_all('p') if i.a is not None]
df.loc[i, 'links'] = values
print(df)
CodePudding user response:
check = soup.find_all('p' , class_ = "articlelist__headerTitle")
df_news = pd.DataFrame(columns = ['link'],data=[url.a.get('href') for url in check])
for i in range(len(check)) :
df_news.loc[i , 'company_id'] = 'Volvo_AB'
print(df_news)