I want to extract the specifications and "The Complete look" on the Myntra website which is only visible if I click on "show more". I wrote the following code for the same:
url = 'https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy'
df = pd.DataFrame(columns=['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
metadata = dict.fromkeys(['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome('chromedriver')
specs = dict()
for i in range(1): #len(links)
driver.get(url)
try:
metadata['title'] = driver.find_element_by_class_name('pdp-title').get_attribute("innerHTML")
metadata['name'] = driver.find_element_by_class_name('pdp-name').get_attribute("innerHTML")
metadata['price'] = driver.find_element_by_class_name('pdp-price').find_element_by_xpath('./strong').get_attribute("innerHTML")
metadata['description'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[1]/p').text
#metadata['Specifications'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[1]/div[1]').text
if driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]'):
print('yes')
element = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]')
element.click()
for i in range(1,20):
try:
specs[driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[{}]/div[1]'.format(i)).text] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[{}]/div[2]'.format(i)).text
except:
break
metadata['Complete the look'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[4]/div[2]/div/p/p').text
except NoSuchElementException:
pass
df = df.append(metadata, ignore_index=True)
I am getting a "yes" in the output, which I guess indicates that the "show more" option is clicked, but I am getting a None in "Complete the look" column of my dataframe. How to get the details hidden inside the "show more", which has the following tag:
<div class="index-sizeFitDesc">
<h4 class="index-sizeFitDescTitle index-product-description-title" style="padding-bottom: 12px;">Specifications</h4>
<div class="index-tableContainer">
<div class="index-row">
<div class="index-rowKey">Sleeve Length</div>
<div class="index-rowValue">Long Sleeves</div>
</div><div class="index-row">
<div class="index-rowKey">Shape</div>
<div class="index-rowValue">Straight</div>
</div><div class="index-row">
<div class="index-rowKey">Neck</div>
<div class="index-rowValue">Mandarin Collar</div>
</div><div class="index-row">
<div class="index-rowKey">Print or Pattern Type</div>
<div class="index-rowValue">Geometric</div>
</div><div class="index-row">
<div class="index-rowKey">Design Styling</div>
<div class="index-rowValue">Regular</div></div>
<div class="index-row">
<div class="index-rowKey">Slit Detail</div>
<div class="index-rowValue">Side Slits</div>
</div><div class="index-row">
<div class="index-rowKey">Length</div>
<div class="index-rowValue">Above Knee</div>
</div><div class="index-row">
<div class="index-rowKey">Hemline</div>
<div class="index-rowValue">Curved</div></div></div>
<div class="index-showMoreText">See More</div></div>
CodePudding user response:
I did not go through all the code that you've written, but to click on show more, I tried the below code, possibly you can inject the below code with your existing code.
We will have to
scroll to that particular element
to letSelenium
know where exactly is the element.I have used JS
.click()
to click on the show more
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
driver.get("https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy")
ele = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.index-showMoreText")))
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
ActionChains(driver).move_to_element(ele).perform()
driver.execute_script("arguments[0].click();", ele)
Complete_The_Look = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.index-product-description-content"))).text
print(Complete_The_Look)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
Output :
Sport this classic kurta from Jompers this season. Achieve a comfortably chic look for your next dinner party or family outing when you team this yellow piece with slim trousers and minimal flair.
CodePudding user response:
The Specifications
in Product Details
is a combination of sections.
And its better to extract details within those section one by one.
And better try finding relative xpaths
for the elements.
url = 'https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy'
# df = pd.DataFrame(columns=['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
metadata = dict.fromkeys(['name','title','price','description','Size & fit','Material & care','Specifications', 'Complete the look'])
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome('chromedriver')
specs = dict()
specfication = []
for i in range(1): #len(links)
driver.get(url)
try:
metadata['title'] = driver.find_element_by_class_name('pdp-title').get_attribute("innerHTML")
metadata['name'] = driver.find_element_by_class_name('pdp-name').get_attribute("innerHTML")
metadata['price'] = driver.find_element_by_class_name('pdp-price').find_element_by_xpath('./strong').get_attribute("innerHTML")
# Details were extracted even without scrolling, but it would be better to scroll down.
driver.execute_script("arguments[0].scrollIntoView(true);",driver.find_element_by_xpath("//div[@class='pdp-productDescriptorsContainer']"))
metadata['description'] = driver.find_element_by_xpath("//p[@class='pdp-product-description-content']").text
#metadata['Specifications'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[1]/div[1]').text
if driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]'):
print('yes')
element = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]')
element.click()
metadata['Size & fit'] = driver.find_element_by_xpath("//h4[contains(text(),'Size')]/following-sibling::p").text
metadata['Material & care']=driver.find_element_by_xpath("//h4[contains(text(),'Material')]/following-sibling::p").text
# from Sleeve Length to Hemline
specn1 = driver.find_elements_by_xpath("//div[@class='index-sizeFitDesc']/div[1]/div")
for spec in specn1:
key = spec.find_element_by_xpath("./div[@class='index-rowKey']").text
value = spec.find_element_by_xpath("./div[@class='index-rowValue']").text
specfication.append([key,value])
#from Colour Family to Occasion
specn2 = driver.find_elements_by_xpath("//div[@class='index-sizeFitDesc']/div[2]/div[1]/div")
for spec in specn2:
key = spec.find_element_by_xpath("./div[@class='index-rowKey']").text
value = spec.find_element_by_xpath("./div[@class='index-rowValue']").text
specfication.append([key, value])
metadata['Specifications'] = specfication
metadata['Complete the look'] = driver.find_element_by_xpath("//h4[contains(text(),'Complete')]/following-sibling::p").text
# metadata['Complete the look'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[4]/div[2]/div/p/p').text
except Exception as e:
print(e)
pass
for key,value in metadata.items():
print(f"{key} : {value}")
# df = df.append(metadata, ignore_index=True)
yes
name : Men Yellow Printed Straight Kurta
title : Jompers
price : Rs. 892
description : Yellow printed straight kurta, has a mandarin collar, long sleeves, straight hem, and side slits
Size & fit : The model (height 6') is wearing a size M
Material & care : Material: Cotton
Hand Wash
Specifications : [['Sleeve Length', 'Long Sleeves'], ['Shape', 'Straight'], ['Neck', 'Mandarin Collar'], ['Print or Pattern Type', 'Solid'], ['Design Styling', 'Regular'], ['Slit Detail', 'Side Slits'], ['Length', 'Knee Length'], ['Hemline', 'Straight'], ['Colour Family', 'Bright'], ['Weave Pattern', 'Regular'], ['Weave Type', 'Machine Weave'], ['Occasion', 'Daily']]
Complete the look : Sport this classic kurta from Jompers this season. Achieve a comfortably chic look for your next dinner party or family outing when you team this yellow piece with slim trousers and minimal flair.