Here's my script :
import re
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
URLs = ['https://www.iwc.com/fr/fr/watch-collections/pilot-watches/iw329303-big-pilots-watch-43.html']
Marques = []
Brands = []
Refs = []
Prices = []
#Carts = []
#Links = []
References = []
Links = []
for url in URLs:
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")
Marques.append('IWC')
Brand = soup.find('span', class_ = 'iwc-buying-options-title').text
Brand = str(Brand)
Brand = re.sub("Ajouter à la liste de souhaits", '', Brand)
Brand = re.sub("\n", '', Brand)
Brands.append(Brand)
Price.append(soup.find('div', class_ = 'iwc-buying-options-price').get_text(strip=True))
Links.append(url)
References.append(soup.find('h1', class_ = 'iwc-buying-options-reference').text)
print(Brand)
print(Price)
print(Links)
print(References)
Unfortunately, Brand
give me that : [" Grande Montre d'Aviateur\xa043 "]
References give me that : ['\n IW329303\n ']
And Price give me nothing, I think it's bcause it's not some sort of text as you can see :
print(soup.find('div', class_ = 'iwc-buying-options-price')
<div ></div>
Any ideas how to do that ?
I would like this output :
CodePudding user response:
You'll want to use .strip()
to get rid of that white space:
so for example you want Brand = soup.find('span', class_ = 'iwc-buying-options-title').text.strip()
Price unfortuntly not as easy. The page is dynamic meaning that html tag does not have the price/content in the static request. It is though in the form of json in another tag:
import re
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import json
URLs = ['https://www.iwc.com/fr/fr/watch-collections/pilot-watches/iw329303-big-pilots-watch-43.html']
Marques = []
Brands = []
Refs = []
Prices = []
#Carts = []
#Links = []
References = []
Links = []
for url in URLs:
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")
Marques.append('IWC')
Brand = soup.find('span', class_ = 'iwc-buying-options-title').text.strip()
Brand = str(Brand)
Brand = re.sub("Ajouter à la liste de souhaits", '', Brand)
Brand = re.sub("\n", '', Brand)
Brands.append(Brand)
price = json.loads(soup.find_all('button', {'type':'submit'})[-1]['data-tracking-products'])[0]['price']
Prices.append(price)
Links.append(url)
References.append(soup.find('h1', class_ = 'iwc-buying-options-reference').text.strip())
print(Brand)
print(Prices)
print(Links)
print(References)
Output:
Grande Montre d'Aviateur 43
['9100.00']
['https://www.iwc.com/fr/fr/watch-collections/pilot-watches/iw329303-big-pilots-watch-43.html']
['IW329303']