For instance in this website: https://www.amazon.com/Lexani-LXUHP-207-All-Season-Radial-Tire-245/dp/B07FFH8F9V/
So I say "inspect" and I find the element that I'm interested:
<span id="productTitle" > Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W </span>
Here's the deal, I want to copy the entire thing. Not just the "Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W" text title of the product. Can someone tell me how can I do this in beatifulsoup or rvest?
I am learning Python and R and I tried to dig it in but couldn't get a raw result.
CodePudding user response:
there will be problems with captcha on amazon, but if you beat it you can get what you want by
import requests
from bs4 import BeautifulSoup
the_entire_thing = BeautifulSoup(requests.get('https://www.amazon.com/Lexani-LXUHP-207-All-Season-Radial-Tire-245/dp/B07FFH8F9V/').text, 'lxml').find(id='productTitle')
CodePudding user response:
In R you can just convert the node to a character vector:
library(rvest)
html <- minimal_html('<span id="productTitle" > Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W </span>')
html_node <- html_element(html, "#productTitle")
as.character(html_node)
#> [1] "<span id=\"productTitle\" class=\"a-size-large product-title-word-break\"> Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W </span>"
Created on 2022-11-02 with reprex v2.0.2