When I execute the following code ;
def getAndParseURL(url):
result = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = bts(result.text, 'html.parser')
return soup
html = getAndParseURL("https://www.cimri.com/cep-telefonlari/en-ucuz-oppo-a74-128gb-4gb-ram-6- 43-inc-48mp-akilli-cep-telefonu-siyah-fiyatlari,775993409")
for i in html.findAll("div",{"class":"s10v53f3-0 bfgzQt"}) :
for b in i.findAll("ul",{"class":"s10v53f3-2 goYFek"}) :
for c in b.findAll("li",{"class":"s10v53f3-4 rKbMg"}) :
for d in c.findAll("span",{"class":"s10v53f3-6 geozbR"}) :
print(d)
It gives me the all technical propeties like below ;
<span class="s10v53f3-6 geozbR">6.43 inç</span>
<span class="s10v53f3-6 geozbR">AMOLED</span>
<span class="s10v53f3-6 geozbR">FHD </span>
<span class="s10v53f3-6 geozbR">1080x2400 Piksel</span>
<span class="s10v53f3-6 geozbR">84.4 %</span>
<span class="s10v53f3-6 geozbR">409 PPI</span>
<span class="s10v53f3-6 geozbR">Kapasitif Ekran</span>
<span class="s10v53f3-6 geozbR">800</span>
<span class="s10v53f3-6 geozbR">1000000:1</span>
<span class="s10v53f3-6 geozbR">Qualcomm SM6115 Snapdragon 662</span>
<span class="s10v53f3-6 geozbR">2.0 GHz</span>
<span class="s10v53f3-6 geozbR">Adreno 610</span>
<span class="s10v53f3-6 geozbR">4 GB RAM</span>
<span class="s10v53f3-6 geozbR">Android 11</span>
<span class="s10v53f3-6 geozbR">Android</span>
<span class="s10v53f3-6 geozbR">8 Çekirdek</span>
<span class="s10v53f3-6 geozbR">11 nm</span>
<span class="s10v53f3-6 geozbR">64 bit</span>
<span class="s10v53f3-6 geozbR">950 MHz</span>
<span class="s10v53f3-6 geozbR">LPDDR4x</span>
<span class="s10v53f3-6 geozbR">Çift Kanal</span>
<span class="s10v53f3-6 geozbR">48 MP</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">16 MP</span>
<span class="s10v53f3-6 geozbR">F1.7</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">1080p (Full HD)</span>
<span class="s10v53f3-6 geozbR">30 FPS</span>
<span class="s10v53f3-6 geozbR">2 MP</span>
<span class="s10v53f3-6 geozbR">LED</span>
<span class="s10v53f3-6 geozbR">73.8 mm</span>
<span class="s10v53f3-6 geozbR">160.3 mm</span>
<span class="s10v53f3-6 geozbR">8 mm</span>
<span class="s10v53f3-6 geozbR">175 gr</span>
<span class="s10v53f3-6 geozbR">Siyah</span>
<span class="s10v53f3-6 geozbR">USB Type-C</span>
<span class="s10v53f3-6 geozbR">Li-Po</span>
<span class="s10v53f3-6 geozbR">5000 mAh</span>
<span class="s10v53f3-6 geozbR">128 GB</span>
<span class="s10v53f3-6 geozbR">5.0</span>
<span class="s10v53f3-6 geozbR">3.5 mm</span>
<span class="s10v53f3-6 geozbR">Wi-Fi 5</span>
<span class="s10v53f3-6 geozbR">42.2 Mbps</span>
<span class="s10v53f3-6 geozbR">5.76 Mbps</span>
<span class="s10v53f3-6 geozbR">2021</span>
<span class="s10v53f3-6 geozbR">Ekran İçinde</span>
<span class="s10v53f3-6 geozbR">Nano-SIM (4FF)</span>
<span class="s10v53f3-6 geozbR">30</span>
<span class="s10v53f3-6 geozbR">1080p</span>
I already have taken as dict all features but when I looked at all phone's brand and model , all have different number of features for every brand and every model , to create dataframe every brand and every model have to have same columns so I have decided to get some of these features in a dataframe.
CodePudding user response:
Note: Your Question needs more clarity to get specific answers. So I just wanna show up two options that will deal with your comment and will help to get closer. They are based on an available product in moment of request
I want to get spesific one let's say , processor model and memory size only.
Option#1
Simply select the span
taht contains your attribute and get the text from its direct sibling:
processorModel = soup.select_one('span:-soup-contains("İşlemci Modeli") span').text
--> Apple A13 Bionic
memorySize = soup.select_one('span:-soup-contains("RAM Kapasitesi") span').text
-->3 GB RAM
Option#2
Create a dict with structured information and iterate over to pick your attributes:
specs = {}
for x in soup.select('[name="specs"] ul'):
specs[x.li.text]= {list(s.stripped_strings)[0]:list(s.stripped_strings)[1] for s in x.select('li:has(span)')}
specs
-->
{'Model Bilgisi': {'Iphone Modelleri': 'Iphone SE'},
'Ekran Özellikleri': {'Ekran Boyutu': '4.7 inç',
'Ekran Teknolojisi': 'IPS LCD',
'Çözünürlük Standartı': 'HD ',
'Ekran Çözünürlüğü': '750x1334 Piksel',
'Ekran Gövde Oranı': '65.4 %',
'Piksel Yoğunluğu': '326 PPI',
'Multi Touch': 'Var',
'Dokunmatik Türü': 'Kapasitif Ekran',
'Ekran Parlaklığı (cd/m²)': '625',
'Çizilmeye Karşı Dayanıklılık': 'Var',
'Ekran Kontrast Oranı': '1400:1',
'Sürekli Açık Ekran': 'Yok'},
'Teknik Özellikler': {'İşlemci Modeli': 'Apple A13 Bionic',
'İşlemci Frekansı': '2.65 GHz',
'Grafik İşlemci (GPU)': 'Apple GPU',
'RAM Kapasitesi': '3 GB RAM',
'Antutu Puanı': 'Belirtilmemiş',
'İşletim Sistemi Versiyonu': 'iOS 13',
'İşletim Sistemi': 'iOS',
'CPU Üretim Süreci': '7 nm '},...}