Home > Software engineering >  How can I split the price to get one price
How can I split the price to get one price

Time:03-17

How can I split the output into two and get only the first or second part I tried this all_original_price = [o.text.split('>').split('₹') for o in all_original_price] but it did not work

from bs4 import BeautifulSoup as soup
import pandas as pd
import re 

url = "https://www.amazon.in/s?k=smart watch&page=1"

original_price =[]

amazon_data = urlopen(url)
amazon_html = amazon_data.read()
a_soup = soup(amazon_html,'html.parser')
all_original_price = a_soup.findAll('span',{'class':'a-price a-text-price'})
all_original_price = [o.text.split('>') for o in all_original_price]
for item in all_original_price:
    original_price.append(item)
print(original_price)```


OUTPUT
[['₹4,999₹4,999'], ['₹6,400₹6,400'], ['₹4,999₹4,999'], ['₹5,999₹5,999'], ['₹4,999₹4,999'], ['₹5,999₹5,999'], ['₹3,999₹3,999'], ['₹6,990₹6,990'], ['₹7,999₹7,999'], ['₹1,599₹1,599'], ['₹5,999₹5,999'], ['₹4,999₹4,999'], ['₹5,999₹5,999'], ['₹4,999₹4,999'], ['₹4,999₹4,999'], ['₹9,999₹9,999'], ['₹6,999₹6,999']]

CodePudding user response:

Try:

url = "https://www.amazon.in/s?k=smart watch&page=1"

original_price =[]

amazon_data = urlopen(url)
amazon_html = amazon_data.read()
a_soup = soup(amazon_html,'html.parser')
all_original_price = a_soup.findAll('span',{'class':'a-price a-text-price'})
all_original_price = [o.find('span', {'class': 'a-offscreen'}).text.split('>') for o in all_original_price]
for item in all_original_price:
    original_price.append(item[0])

Output:

>>> original_price
['₹4,999',
 '₹6,400',
 '₹4,999',
 '₹5,999',
 '₹4,999',
 '₹5,999',
 '₹3,999',
 '₹6,990',
 '₹7,999',
 '₹1,599',
 '₹5,999',
 '₹4,999',
 '₹5,999',
 '₹4,999',
 '₹4,999',
 '₹9,999',
 '₹6,990']

CodePudding user response:

Try this:

all_original_price = [o.text.split('>')[0].split('₹')[1:] for o in all_original_price]

CodePudding user response:

You can simply use re module. Since you have imported it already, I do not show the import line in the code below:

[re.search(r"₹\d (,\d )?", x[0]).group() for x in all_original_price]

Output

['₹4,999',
 '₹6,400',
 '₹4,999',
 '₹5,999',
 '₹4,999',
 '₹5,999',
 '₹3,999',
 '₹6,990',
 '₹7,999',
 '₹1,599',
 '₹5,999',
 '₹4,999',
 '₹5,999',
 '₹4,999',
 '₹4,999',
 '₹9,999',
 '₹6,999']
  • Related