I am trying out webscrapping using BeautifulSoup.
I only want extract the content from this webpage basically everything from Barry Kripke without all the headers..etc.
Next I tried to get all the links, but that didn't work either - I got only 2 links:
for t in article.find_all('a'):
print(t)
Please can someone help me with this.
CodePudding user response:
You only grab and print out the 1st <p>
tag with article = soup.find_all(article_tag)[0]
You need to go through all the <p>
tags:
import requests
from bs4 import BeautifulSoup
url = 'https://bigbangtheory.fandom.com/wiki/Barry_Kripke'
r = requests.get(url)
if r.status_code == 200:
page = r.text
print('Type of the variable \'page\':', page.__class__.__name__)
print('Page Retrieved. Request Status: %d, Page Size: %d' % (r.status_code, len(page)))
else:
print('Some problem occurred. Request Status: %s' % r.status_code)
soup = BeautifulSoup(page, 'html.parser')
print('Type of the variable \'soup\':', soup.__class__.__name__)
print(soup.prettify()[:1000])
article_tag = 'p'
articles = soup.find_all(article_tag)
for p in articles:
print(p.text)
CodePudding user response:
The following code will generate each paragraph along with its corresponding link as well.
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get('https://bigbangtheory.fandom.com/wiki/Barry_Kripke')
soup = BeautifulSoup(r.text,'lxml')
pragraph =[x.get_text(strip=True) for x in soup.select_one('div#content').find_all('p')][0:31
#print(pragraph)
URL = []
for link in soup.select_one('div#content').select('p > a')[0:31:
links = 'https://bigbangtheory.fandom.com/' link.get('href') if link else None
URL.append(links)
df = pd.DataFrame(data = list(zip(pragraph,URL)),columns=['pragraph','URL'])
print(df)
Output:
pragraph URL
0 Barry KripkeAdultYoung AdultGeneral Informatio... https://bigbangtheory.fandom.com//wiki/Beverly...
1 Beverly Hofstadter(romantic interest) https://bigbangtheory.fandom.com//wiki/Caltech
2 Barry Kripke, Ph.D. is aCaltechplasma-physicis... https://bigbangtheory.fandom.com//wiki/String_...
3 Kripke has no appearances in any episodes ofSe... https://bigbangtheory.fandom.com//wiki/Leonard...
4 In his first appearance, he pitted his "kiwwa ... https://bigbangtheory.fandom.com//wiki/Sheldon...
5 He continued to appear inSeason 3, where he mo... https://bigbangtheory.fandom.com//wiki/Leonard...
6 A year later, he was invited toSheldon and Leo... https://bigbangtheory.fandom.com//wiki/Howard_...
7 In this season, Barry tells Sheldon, who still... https://bigbangtheory.fandom.com//wiki/Rajesh_...
8 In "The Rothman Disintegration", Barry argues ... https://bigbangtheory.fandom.com//wiki/Amy_Far...
9 Kripke talking to Sheldon. https://bigbangtheory.fandom.com//wiki/Season_1
10 Kripke returned in "The Cooper/Kripke Inversio... https://bigbangtheory.fandom.com//wiki/Kripke_...
11 During the events of "The Tenure Turbulence," ... https://bigbangtheory.fandom.com//wiki/M.O.N.T.E.
12 After Sheldon retracts his paper on the existe... https://bigbangtheory.fandom.com//wiki/Caltech
13 In "The Relationship Diremption", Barry taunts... https://bigbangtheory.fandom.com//wiki/The_Kil...
14 In "The Champagne Reflection", Barry is seen a... https://bigbangtheory.fandom.com//wiki/Penny
15 In "The Comic Book Store Regeneration", Barry ... https://bigbangtheory.fandom.com//wiki/Howard_...
16 In "The Perspiration Implementation", Barry Kr... https://bigbangtheory.fandom.com//wiki/Penny
17 In the following episode "The Helium Insuffici... https://bigbangtheory.fandom.com//wiki/The_Fri...
18 In "The Valentino Submergence", he interruptsF... https://bigbangtheory.fandom.com//wiki/Season_3
19 In "The Celebration Experimentation", Kripke a... https://bigbangtheory.fandom.com//wiki/The_Ele...
20 He reappears in "The Geology Elevation", where... https://bigbangtheory.fandom.com//wiki/Sheldon...
21 He is mentioned in "The Allowance Evaporation"... https://bigbangtheory.fandom.com//wiki/The_Caf...
22 In "The Tesla Recoil", after learning Sheldon ... https://bigbangtheory.fandom.com//wiki/Leonard...
23 He reappeared in "The Athenaeum Allocation".Sh... https://bigbangtheory.fandom.com//wiki/Rajesh_...
24 In "The Bow Tie Asymmetry", Barry appears asSh... https://bigbangtheory.fandom.com//wiki/Sheldon...
25 He appeared in "The Grant Allocation Derivatio... https://bigbangtheory.fandom.com//wiki/Preside...
26 InThe Plagiarism Schism, Kripke offe... https://bigbangtheory.fandom.com//wiki/The_Ven...
27 In the penultimate episode "The Change Constan... https://bigbangtheory.fandom.com//wiki/Apartme...
28 "We’re all pathetic and cweepy and can’t get g... https://bigbangtheory.fandom.com//wiki/Zack_Jo...
29 Despite being a stereotypical geek, Barry appa... https://bigbangtheory.fandom.com//wiki/Stuart_...
30 Kripke is often suggested to be coasting in hi... https://bigbangtheory.fandom.com//wiki/LeVar_B...