Home > Blockchain >  How to scrape text of span classes that have the same class value?
How to scrape text of span classes that have the same class value?

Time:12-09

I want to get data from screenshot

When I execute the following code ;

def getAndParseURL(url):
    result = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
    soup = bts(result.text, 'html.parser')
    return soup

html = getAndParseURL("https://www.cimri.com/cep-telefonlari/en-ucuz-oppo-a74-128gb-4gb-ram-6- 43-inc-48mp-akilli-cep-telefonu-siyah-fiyatlari,775993409")

for i in html.findAll("div",{"class":"s10v53f3-0 bfgzQt"}) :
    for b in i.findAll("ul",{"class":"s10v53f3-2 goYFek"}) :
        for c in b.findAll("li",{"class":"s10v53f3-4 rKbMg"}) :
            for d in c.findAll("span",{"class":"s10v53f3-6 geozbR"}) :
                print(d)

It gives me the all technical propeties like below ;

<span class="s10v53f3-6 geozbR">6.43 inç</span>
<span class="s10v53f3-6 geozbR">AMOLED</span>
<span class="s10v53f3-6 geozbR">FHD </span>
<span class="s10v53f3-6 geozbR">1080x2400 Piksel</span>
<span class="s10v53f3-6 geozbR">84.4 %</span>
<span class="s10v53f3-6 geozbR">409 PPI</span>
<span class="s10v53f3-6 geozbR">Kapasitif Ekran</span>
<span class="s10v53f3-6 geozbR">800</span>
<span class="s10v53f3-6 geozbR">1000000:1</span>
<span class="s10v53f3-6 geozbR">Qualcomm SM6115 Snapdragon 662</span>
<span class="s10v53f3-6 geozbR">2.0 GHz</span>
<span class="s10v53f3-6 geozbR">Adreno 610</span>
<span class="s10v53f3-6 geozbR">4 GB RAM</span>
<span class="s10v53f3-6 geozbR">Android 11</span>
<span class="s10v53f3-6 geozbR">Android</span>
<span class="s10v53f3-6 geozbR">8 Çekirdek</span>
<span class="s10v53f3-6 geozbR">11 nm</span>
<span class="s10v53f3-6 geozbR">64 bit</span>
<span class="s10v53f3-6 geozbR">950 MHz</span>
<span class="s10v53f3-6 geozbR">LPDDR4x</span>
<span class="s10v53f3-6 geozbR">Çift Kanal</span>
<span class="s10v53f3-6 geozbR">48 MP</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">16 MP</span>
<span class="s10v53f3-6 geozbR">F1.7</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">1080p (Full HD)</span>
<span class="s10v53f3-6 geozbR">30 FPS</span>
<span class="s10v53f3-6 geozbR">2 MP</span>
<span class="s10v53f3-6 geozbR">LED</span>
<span class="s10v53f3-6 geozbR">73.8 mm</span>
<span class="s10v53f3-6 geozbR">160.3 mm</span>
<span class="s10v53f3-6 geozbR">8 mm</span>
<span class="s10v53f3-6 geozbR">175 gr</span>
<span class="s10v53f3-6 geozbR">Siyah</span>
<span class="s10v53f3-6 geozbR">USB Type-C</span>
<span class="s10v53f3-6 geozbR">Li-Po</span>
<span class="s10v53f3-6 geozbR">5000 mAh</span>
<span class="s10v53f3-6 geozbR">128 GB</span>
<span class="s10v53f3-6 geozbR">5.0</span>
<span class="s10v53f3-6 geozbR">3.5 mm</span>
<span class="s10v53f3-6 geozbR">Wi-Fi 5</span>
<span class="s10v53f3-6 geozbR">42.2 Mbps</span>
<span class="s10v53f3-6 geozbR">5.76 Mbps</span>
<span class="s10v53f3-6 geozbR">2021</span>
<span class="s10v53f3-6 geozbR">Ekran İçinde</span>
<span class="s10v53f3-6 geozbR">Nano-SIM (4FF)</span>
<span class="s10v53f3-6 geozbR">30</span>
<span class="s10v53f3-6 geozbR">1080p</span>

I already have taken as dict all features but when I looked at all phone's brand and model , all have different number of features for every brand and every model , to create dataframe every brand and every model have to have same columns so I have decided to get some of these features in a dataframe.

CodePudding user response:

Note: Your Question needs more clarity to get specific answers. So I just wanna show up two options that will deal with your comment and will help to get closer. They are based on an available product in moment of request

I want to get spesific one let's say , processor model and memory size only.

Option#1

Simply select the span taht contains your attribute and get the text from its direct sibling:

processorModel = soup.select_one('span:-soup-contains("İşlemci Modeli")   span').text 
--> Apple A13 Bionic

memorySize = soup.select_one('span:-soup-contains("RAM Kapasitesi")   span').text   
-->3 GB RAM

Option#2

Create a dict with structured information and iterate over to pick your attributes:

specs = {}
for x in soup.select('[name="specs"] ul'):
    specs[x.li.text]= {list(s.stripped_strings)[0]:list(s.stripped_strings)[1] for s in x.select('li:has(span)')}
specs

-->

{'Model Bilgisi': {'Iphone Modelleri': 'Iphone SE'},
 'Ekran Özellikleri': {'Ekran Boyutu': '4.7 inç',
  'Ekran Teknolojisi': 'IPS LCD',
  'Çözünürlük Standartı': 'HD ',
  'Ekran Çözünürlüğü': '750x1334 Piksel',
  'Ekran Gövde Oranı': '65.4 %',
  'Piksel Yoğunluğu': '326 PPI',
  'Multi Touch': 'Var',
  'Dokunmatik Türü': 'Kapasitif Ekran',
  'Ekran Parlaklığı (cd/m²)': '625',
  'Çizilmeye Karşı Dayanıklılık': 'Var',
  'Ekran Kontrast Oranı': '1400:1',
  'Sürekli Açık Ekran': 'Yok'},
 'Teknik Özellikler': {'İşlemci Modeli': 'Apple A13 Bionic',
  'İşlemci Frekansı': '2.65 GHz',
  'Grafik İşlemci (GPU)': 'Apple GPU',
  'RAM Kapasitesi': '3 GB RAM',
  'Antutu Puanı': 'Belirtilmemiş',
  'İşletim Sistemi Versiyonu': 'iOS 13',
  'İşletim Sistemi': 'iOS',
  'CPU Üretim Süreci': '7 nm '},...}
  • Related