I wanna extract the number that is before opiniones, I can find the span
that contains it but I cannot retrieve it.
Code example:
list_rest =[]
for res_name, res_stats in zip(top_rest, top_rest_info):
dataframe ={}
dataframe["pos"] = res_name.find('a').contents[0]
dataframe["name"] = res_name.find('a').contents[-1]
dataframe["number_of_reviews"] = res_stats.find("span", attrs={"class": "NoCoR"})
list_rest.append(dataframe)
Output:
[{'pos': 'La Gourmesa',
'name': 'La Gourmesa',
'number_of_reviews': <span class="NoCoR">3<!-- --> opiniones</span>},
{'pos': '1',
'name': 'Parrilla Urbana División del Norte',
'number_of_reviews': <span class="NoCoR">486<!-- --> opiniones</span>},
{'pos': '2',
'name': 'La Mansion Marriott Reforma',
'number_of_reviews': <span class="NoCoR">730<!-- --> opiniones</span>},
{'pos': '3',
'name': 'Restaurante Condimento Emporio Reforma',
'number_of_reviews': <span class="NoCoR">283<!-- --> opiniones</span>},
{'pos': '4',
'name': "Porfirio's Coapa",
'number_of_reviews': <span class="NoCoR">468<!-- --> opiniones</span>}]
How do I extract the number in number of reviews?
CodePudding user response:
Here I have taken HTML
as example for understanding you can use get_text()
or text
method to extract text from tag and split based on space and extract first field
html="""<span class='NoCoR'>3<!-- --> opiniones</span>
<span >486<!-- --> opiniones</span>
<span >730<!-- --> opiniones</span>"""
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
main_data=soup.find_all("span",attrs={"class":"NoCoR"})
for data in main_data:
print(data.get_text().split(" ")[0])
Output:
3
486
730
For your code it should work like this:
dataframe["number_of_reviews"] = res_stats.find("span", attrs={"class": "NoCoR"}).get_text().split(" ")[0]
CodePudding user response:
You are still working with the solution, so why do not already take this to grab the number from the tag too?
Solution
Children of a tag are available in a list called .contents
so picking the first one should solve your issue - append .contents[0]
to your line of code:
res_stats.find("span", attrs={"class": "NoCoR"}).contents[0]
Example for a list of options
from bs4 import BeautifulSoup
html='''<span class='NoCoR'>3<!-- --> opiniones</span><span >486<!-- --> opiniones</span><span >730<!-- --> opiniones</span><span >283<!-- --> opiniones</span><span >468<!-- --> opiniones</span>'''
soup=BeautifulSoup(html,'html.parser')
for opinion in soup.select('span.NoCoR'):
print(opinion.contents[0])
Output
3
486
730
283
468