Home > Back-end >  Struggle to obtain a clean excel with beautiful soup
Struggle to obtain a clean excel with beautiful soup

Time:11-04

I am trying to get infos from a website about their opening hours, but my result is pretty disappointing.

import requests
from bs4 import BeautifulSoup
import xlsxwriter

i = "90460"

URL = "https://www.tuodi.it/negozi-dettaglio.cfm?negozio=%s" % i
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="orario" , style="width:50%;float:left")
orari = results.find_all("div", class_="tab", style="width:220px;line-height: 25px")

print(orari)

My output looks like the following

[<div  style="width:220px;line-height: 25px">
                            8,30 
                            - 20,00 
                            <br/>
                            
                            
                            
                            8,30 
                            - 20,00 
                            <br/>...

But I would rather have a result which could be exported to excel in the form of

Excel result

Thanks in advance!

CodePudding user response:

To get your result you can use .stripped_strings and a list comprehension:

[''.join(x.split()) for x in orari[0].stripped_strings]

This will give you a list, that you can write to a file:

['8,30-20,00', '8,30-20,00', '8,30-20,00', '8,30-20,00', '8,30-20,00', '8,30-20,00', '8,00-13,00']

Example

import requests
from bs4 import BeautifulSoup
import pandas as pd
i = "90460"

URL = "https://www.tuodi.it/negozi-dettaglio.cfm?negozio=%s" % i
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="orario" , style="width:50%;float:left")
orari = results.find_all("div", class_="tab", style="width:220px;line-height: 25px")

data = [''.join(x.split()) for x in orari[0].stripped_strings]

pd.DataFrame([data]).to_excel('test.xslx', index=False)
  • Related