I want to get a list of all of the option value="5653923ac7eb6e355878bfe6 but I can't seem to be able to use even the simplest parse methods with bs4.
I'm confused!
All this prints is empty brackets []
Here's my code:
dropdown_text = s.post(base_url watchlist_url get_user_watch_url, data=data)
#Get list of wanted market ids
soup = bs(dropdown_text.text,'lxml')
print(soup)
#pprint.pprint(soup)
test = soup.find('body')
print(test)
test = soup.findAll('option',{"value": 'US - Dow Jones Industrials Index'})[0]
print(test)
<html><body><p>{"Html":"\r\n\r\n\u003cinput name=\"__RequestVerificationToken\" type=\"hidden\" value=\"4kKsQ_RWnpHMezsKWCJpcnhN45rqz5wJ96JPRLcbA4r2x_Mpc8UlvP0SleR93TRexSqg1sYIBSLuIAkd1AfjQXM6q_lXjM3BkJXj9Hyn_OcKLvFCc3g25Fpv7pKrKM-3Mv1QpH7VzZRGzDLfm-gf4Q2\" /\u003e\r\n\r\n\u003cdiv class=\"inner-01\"\u003e\r\n\u003cform class=\"form form-01 try-001\" id=\"analysisForm\"\u003e\r\n Select a Market from Your Watchlist:\r\n \u003cdiv class=\"type\"\u003e\r\n \u003cselect id=\"WatchlistItem\" name=\"WatchlistItem\"\u003e\u003coption value=\"\"\u003eSelect Market\u003c/option\u003e\r\n\u003coption selected=\"selected\" value=\"In Order\"\u003eShow All - In Order\u003c/option\u003e\r\n\u003coption value=\"A-Z\"\u003eShow All - Alphabetical A-Z\u003c/option\u003e\r\n\u003coption value=\"Z-A\"\u003eShow All - Alphabetical Z-A\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfe6\"\u003eUS - S\u0026amp;P 500 Index\u003c/option\u003e\r\n\u003coption value=\"5a2b420decfdc711085dc51d\"\u003eBitcoin Per USD\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfeb\"\u003eUS - Dow Jones Industrials Index\u003c/option\u003e\r\n\u003coption value=\"5a8e2222ecfdc7197c27d1a3\"
etc
CodePudding user response:
@HedgeHog Good idea but it returns the exact same thing
Not exactly the same, now you operate on the correct soup
based on your example. question may needs some improvment cause it is not exactly reproducable
Main issues:
- Your example try to operate on dict / JSON like string
- Your selection
soup.findAll('option',{"value": 'US - Dow Jones Industrials Index'})
is looking for an attributevalue
that should be equal toUS - Dow Jones Industrials Index
but there is no one, it is the text of the tag.
In newer code avoid old syntax findAll()
instead use find_all()
- For more take a minute to check docs
Following should fix behavior, I used css selectors
instead of find_all()
, cause its more handy for me:
...
dropdown_text = s.post(base_url watchlist_url get_user_watch_url, data=data)
soup = bs(dropdown_text.json()['Html'])
soup.select('option:-soup-contains("US - Dow Jones Industrials Index")')[0].get('value')
...
Example
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup({"Html":"\r\n\r\n\u003cinput name=\"__RequestVerificationToken\" type=\"hidden\" value=\"4kKsQ_RWnpHMezsKWCJpcnhN45rqz5wJ96JPRLcbA4r2x_Mpc8UlvP0SleR93TRexSqg1sYIBSLuIAkd1AfjQXM6q_lXjM3BkJXj9Hyn_OcKLvFCc3g25Fpv7pKrKM-3Mv1QpH7VzZRGzDLfm-gf4Q2\" /\u003e\r\n\r\n\u003cdiv class=\"inner-01\"\u003e\r\n\u003cform class=\"form form-01 try-001\" id=\"analysisForm\"\u003e\r\n Select a Market from Your Watchlist:\r\n \u003cdiv class=\"type\"\u003e\r\n \u003cselect id=\"WatchlistItem\" name=\"WatchlistItem\"\u003e\u003coption value=\"\"\u003eSelect Market\u003c/option\u003e\r\n\u003coption selected=\"selected\" value=\"In Order\"\u003eShow All - In Order\u003c/option\u003e\r\n\u003coption value=\"A-Z\"\u003eShow All - Alphabetical A-Z\u003c/option\u003e\r\n\u003coption value=\"Z-A\"\u003eShow All - Alphabetical Z-A\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfe6\"\u003eUS - S\u0026amp;P 500 Index\u003c/option\u003e\r\n\u003coption value=\"5a2b420decfdc711085dc51d\"\u003eBitcoin Per USD\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfeb\"\u003eUS - Dow Jones Industrials Index\u003c/option\u003e"}['Html'])
soup.select('option:-soup-contains("US - Dow Jones Industrials Index")')[0].get('value')
Output
5653923ac7eb6e355878bfeb