I am new to the web scraping. I am trying to scrape "When purchase Online"
When purchased online in the Target. But i did not find it in the HTML.
.
Does anyone konw how to locate the element in HTML? Any help appreciates. Thanks!
Product Url:
https://www.target.com/c/allergy-sinus-medicines-treatments-health/-/N-4y5ny?Nao=144
CodePudding user response:
I have no idea which element you want to get but API
sends JSON data, not HTML, and you may simply convert it to dictionary/list and use keys/indexes to get value.
But you have to manually find correct keys in JSON data.
Or you may write some script to search in JSON (using for-
loops and recursions)
Minimal working code. I found keys manually.
import requests
url = 'https://redsky.target.com/redsky_aggregations/v1/web/pdp_client_v1?key=9f36aeafbe60771e321a7cc95a78140772ab3e96&tcin=80130848&is_bot=false&member_id=0&store_id=1771&pricing_store_id=1771&has_pricing_store_id=true&scheduled_delivery_store_id=1771&has_financing_options=true&visitor_id=01819D268B380201B177CA755BCE70CC&has_size_context=true&latitude=41.9831&longitude=-91.6686&zip=52404&state=IA' # JSON
response = requests.get(url)
data = response.json()
product = data['data']['product']
print('price:', product['price']['current_retail'])
print('title:', product['item']['product_description']['title'])
print('description:', product['item']['product_description']['downstream_description'])
print('------------')
for bullet in product['item']['product_description']['bullet_descriptions']:
print(bullet)
print('------------')
print(product['item']['product_description']['soft_bullets']['title'])
for bullet in product['item']['product_description']['soft_bullets']['bullets']:
print('-', bullet)
print('------------')
for attribute in product['item']['wellness_merchandise_attributes']:
print('-', attribute['value_name'])
print(' ', attribute['wellness_description'])
Result:
price: 13.99
title: Genexa Dextromethorphan Kids' Cough and Chest Congestion Suppressant - 4 fl oz
description: Genexa Kids’ Cough & Chest Congestion is real medicine, made clean - a powerful cough suppressant and expectorant that helps control cough, relieves chest congestion and helps thin and loosen mucus. This liquid, non-drowsy medicine has the same active ingredients you need (dextromethorphan HBr and guaifenesin), but without the artificial ones you don’t (dyes, common allergens, parabens). We only use ingredients people deserve to make the first gluten-free, non-GMO, certified vegan medicines to help your little ones feel better. <br /><br />Genexa is the first clean medicine company. Founded by two dads who believe in putting People Over Everything, Genexa makes medicine with the same active ingredients people need, but without the artificial ones they don’t. It’s real medicine, made clean.
------------
<B>Suggested Age:</B> 4 Years and Up
<B>Product Form:</B> Liquid
<B>Primary Active Ingredient:</B> Dextromethorphan
<B>Package Quantity:</B> 1
<B>Net weight:</B> 4 fl oz (US)
------------
highlights
- This is an age restricted item and will require us to take a quick peek at your ID upon pick-up
- Helps relieve kids’ chest congestion and makes coughs more productive by thinning and loosening mucus
- Non-drowsy so your little ones (ages 4 ) can get back to playing
- Our medicine is junk-free, with no artificial sweeteners or preservatives, no dyes, no parabens, and no common allergens
- Certified gluten-free, vegan, and non-GMO
- Flavored with real organic blueberries
- Gentle on little tummies
------------
- Dye-Free
A product that either makes an unqualified on-pack statement indicating that it does not contain dye, or carries an unqualified on-pack statement such as "no dyes" or "dye-free."
- Gluten Free
A product that has an unqualified independent third-party certification, or carries an on-pack statement relating to the finished product being gluten-free.
- Non-GMO
A product that has an independent third-party certification, or carries an unqualified on-pack statement relating to the final product being made without genetically engineered ingredients.
- Vegan
A product that carries an unqualified independent, third-party certification, or carries on-pack statement relating to the product being 100% vegan.
- HSA/FSA Eligible
Restrictions apply; contact your insurance provider about plan allowances and requirements
EDIT:
Information "When purchased online"
(or "at Cedar Rapids South"
) are in different url.
For example
Product url:
API product data:
API "at Cedar Rapids South"
:
But probably in some situations it may use other information in product data to put "When purchased online"
instead of "at Cedar Rapids South"
- and this can be hardcoded in JavaScript. For example product which displays "When purchased online"
has formatted_price $13.99
but product which displays "at Cedar Rapids South"
has formatted_price "See price in cart"
import requests
url = 'https://redsky.target.com/redsky_aggregations/v1/web/plp_search_v1?key=9f36aeafbe60771e321a7cc95a78140772ab3e96&brand_id=q643lel65ir&channel=WEB&count=24&default_purchasability_filter=true&offset=0&page=/b/q643lel65ir&platform=desktop&pricing_store_id=1771&store_ids=1771,1768,1113,3374,1792&useragent=Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0&visitor_id=01819D268B380201B177CA755BCE70CC' # JSON
response = requests.get(url)
data = response.json()
for product in data['data']['search']['products']:
print('title:', product['item']['product_description']['title'])
print('price:', product['price']['current_retail'])
print('formatted:', product['price']['formatted_current_price'])
print('---')
Result:
title: Genexa Kids' Diphenhydramine Allergy Liquid Medicine - Organic Agave - 4 fl oz
price: 7.99
formatted: See price in cart
---
title: Genexa Dextromethorphan Kids' Cough and Chest Congestion Suppressant - 4 fl oz
price: 13.99
formatted: $13.99
---