Home > Back-end >  How do I find data of parent that includes specific child?
How do I find data of parent that includes specific child?

Time:05-04

I want to find a way that if the parent include certain child with p, then it extract data from that specific div.

This is what I did but it will still give me the data of other <div> where it doesn't include the <p> as well.

import requests
from bs4 import BeautifulSoup
import smtplib
import os
from csv import writer

for n in range (1, 2):
  url = f'https://www.musinsa.com/app/reviews/lists?type=&year_date=2022&month_date=&day_date=&max_rt=2022&min_rt=2009&brand=&page={n}'
  headers = {"User-Agent": ""}
  page = requests.get(url, headers=headers)
  Soup1 = BeautifulSoup(page.content, "html.parser")
  Soup2 = BeautifulSoup(Soup1.prettify(), "html.parser")
  if Soup2.find_all('p', {'class':'review-profile__body_information'}):
    ss = Soup2.find_all('p', {'class':'review-profile__body_information'})
    product_name = Soup2.find_all('a',{'class':'review-goods-information__name'})
    rating = Soup2.find_all('span',{'class':'review-list__rating__active'})
    comment = Soup2.find_all('div',{'class','review-contents__text '})
    eval = Soup2.find_all('li',{'class':'review-evaluation__item'})
    images = Soup2.find_all('li',{'class':'review-content-photo__item'})
    allinfo = [ss, product_name, rating, comment, eval, images]
    print(allinfo)

How do I need to write so that it will only give me data of 'div' that includes specific <p>? thanks

1

CodePudding user response:

I recommend you to use XPath. more details about it here: https://www.zyte.com/blog/an-introduction-to-xpath-with-examples/

so what about code? I recommend you to use the lxml module (https://pypi.org/project/lxml/) which supports using xpathes

# pip install lxml
from lxml import html
import requests


url = "your url you want to get"
request = requests.get(url)

# there we convert request content(page) to html class object
page_tree = html.fromstring(request.content)

# to use xpath you need to use this
element = page_tree.xpath("xpath to object")

#to get text of an element
text = element.text

#to get specific value use this
value = element.get("href") #or enother property of an element

So how to get a child? To get a child of an element you need to add "/child::" to the end of your parent XPath for example:

<div class='main_div'>
    <div class='parent'>
        <p class='content'>
            "text you need"
        </p>
    </div>
</div>

to get "text you need" you can use following xpath: //div[@class='content']/child::p

You need to learn more abouth xpath, it is more simple that using beautiful soup to find elements

CodePudding user response:

One approach could be to select your elements mor specific e.g. with css selectors:

soup.select('div.review-list:has(p.review-profile__body_information)')

You should also change your strategy from generating a lot of lists, better would be to itereate each item, check and pick the needed info.

Example
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}

data = []
for n in range (1,3):
    url = f'https://www.musinsa.com/app/reviews/lists?type=&year_date=2022&month_date=&day_date=&max_rt=2022&min_rt=2009&brand=&page={n}'
    res = requests.get(url, headers=headers)
    soup = BeautifulSoup(res.text)
    for e in soup.select('div.review-list:has(p.review-profile__body_information)'):
        data.append({
            'product_name':e.select_one('.review-goods-information__name').text.strip(),
            'comment':e.select_one('.review-contents__text').text.strip(),
            'something':'more ...'
        })
data
Output
[{'product_name': '[패키지] DUMBY BEAR 2PACK T-SHIRTS',
  'comment': '중학생 둘째아들 입힐려고 구매했습니다 \n편하고 이쁘네요. 잘 입을거 같아요',
  'something': 'more ...'},
 {'product_name': '컴피 워셔 후드 숏 자켓_차콜',
  'comment': '주문한지 얼마 되지 않아서 바로 도착을 하였네요',
  'something': 'more ...'},
 {'product_name': '컬러플러스 미니로고 버뮤다 숏팬츠 MINT GREEN',
  'comment': '색상도 이뿌고 반바지 기장도 넉넉하니 부담없이\n자주 입게 되겠네요~',
  'something': 'more ...'},
 {'product_name': 'CARGO STRING PANTS _ OLIVE',
  'comment': '검은색 사고 재구매인데 너무 편하고 이쁩니다 \n만족!',
  'something': 'more ...'}]
  • Related