I am trying to get the titles of game but with title i am getting span text also
here is my code
import time
import requests,pandas
from bs4 import BeautifulSoup
r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1", headers=
{'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101
Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = bs4.find_all("h3",{"class":"indent"})
print(all)
Output
[<h3 >
<div><span>1</span></div>
Fran Bow </h3>, <h3 >
<div><span>2</span></div>
Bendy and the Ink Machine </h3>, <h3 >
<div><span>3</span></div>
Five Nights at Freddy's </h3>, <h3 >
<div><span>4</span></div>
Sanitarium </h3>, <h3 >
<div><span>5</span></div>
OXENFREE </h3>, <h3 >
<div><span>6</span></div>
Thimbleweed Park </h3>, <h3 >
<div><span>7</span></div>
Samsara Room </h3>, <h3 >
i tried this code also but not working
#all = all.find_all("h3")[0].text
CodePudding user response:
Here is the minimal working solution
Code:
import time
import requests,pandas
from bs4 import BeautifulSoup
r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1", headers=
{'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = bs4.find_all("h3",{"class":"indent"})
for title in all:
print(' '.join(title.text.split()[1:]))
Output:
Fran Bow
Bendy and the Ink Machine
Five Nights at Freddy's
Sanitarium
OXENFREE
Thimbleweed Park
Samsara Room
Into the Dead 2
Slayaway Camp
Eyes - the horror game
Slendrina:The Cellar
Hello Neighbor
Alien: Blackout
Rest in Pieces
Friday the 13th: Killer Puzzle
I Am Innocent
Detention
Limbo
Knock-Knock
Sara Is Missing
Death Park: Scary Horror Clown
Horror Hospital 2
Horrorfield - Multiplayer Survival Horror Game
Erich Sann: Horror in the scary Academy
The Innsmouth Case
CodePudding user response:
How to fix?
Cause the text you wanna get is always the last element in <h3>
you can extract it by contents
of <h3>
.
element.contents[-1]
To get the text iterate over result set:
for x in bs4.find_all("h3",{"class":"indent"}):
print(x.contents[-1].get_text(strip=True))
Example
import requests,pandas
from bs4 import BeautifulSoup
r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1",
headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = [x.contents[-1].get_text(strip=True) for x in bs4.find_all("h3",{"class":"indent"})]
print(all)
Output
['Fran Bow', 'Bendy and the Ink Machine', "Five Nights at Freddy's", 'Sanitarium', 'OXENFREE', 'Thimbleweed Park', 'Samsara Room', 'Into the Dead 2', 'Slayaway Camp', 'Eyes - the horror game', 'Slendrina:The Cellar', 'Hello Neighbor', 'Alien: Blackout', 'Rest in Pieces', 'Friday the 13th: Killer Puzzle', 'I Am Innocent', 'Detention', 'Limbo', 'Knock-Knock', 'Sara Is Missing', 'Death Park: Scary Horror Clown', 'Horror Hospital 2', 'Horrorfield - Multiplayer Survival Horror Game', 'Erich Sann: Horror in the scary Academy', 'The Innsmouth Case']