Home > OS >  Html comes back weird after requests.post
Html comes back weird after requests.post

Time:05-29

I want to get a list of all of the option value="5653923ac7eb6e355878bfe6 but I can't seem to be able to use even the simplest parse methods with bs4. I'm confused! All this prints is empty brackets []

Here's my code:

dropdown_text = s.post(base_url watchlist_url get_user_watch_url, data=data)
#Get list of wanted market ids
soup = bs(dropdown_text.text,'lxml')
print(soup)
#pprint.pprint(soup)
test = soup.find('body')
print(test)
test = soup.findAll('option',{"value": 'US - Dow Jones Industrials Index'})[0]
print(test)
<html><body><p>{"Html":"\r\n\r\n\u003cinput name=\"__RequestVerificationToken\" type=\"hidden\" value=\"4kKsQ_RWnpHMezsKWCJpcnhN45rqz5wJ96JPRLcbA4r2x_Mpc8UlvP0SleR93TRexSqg1sYIBSLuIAkd1AfjQXM6q_lXjM3BkJXj9Hyn_OcKLvFCc3g25Fpv7pKrKM-3Mv1QpH7VzZRGzDLfm-gf4Q2\" /\u003e\r\n\r\n\u003cdiv class=\"inner-01\"\u003e\r\n\u003cform class=\"form form-01 try-001\" id=\"analysisForm\"\u003e\r\n    Select a Market from Your Watchlist:\r\n    \u003cdiv class=\"type\"\u003e\r\n        \u003cselect id=\"WatchlistItem\" name=\"WatchlistItem\"\u003e\u003coption value=\"\"\u003eSelect Market\u003c/option\u003e\r\n\u003coption selected=\"selected\" value=\"In Order\"\u003eShow All - In Order\u003c/option\u003e\r\n\u003coption value=\"A-Z\"\u003eShow All - Alphabetical A-Z\u003c/option\u003e\r\n\u003coption value=\"Z-A\"\u003eShow All - Alphabetical Z-A\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfe6\"\u003eUS - S\u0026amp;P 500 Index\u003c/option\u003e\r\n\u003coption value=\"5a2b420decfdc711085dc51d\"\u003eBitcoin Per USD\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfeb\"\u003eUS - Dow Jones Industrials Index\u003c/option\u003e\r\n\u003coption value=\"5a8e2222ecfdc7197c27d1a3\"
etc

CodePudding user response:

@HedgeHog Good idea but it returns the exact same thing

Not exactly the same, now you operate on the correct soup based on your example. question may needs some improvment cause it is not exactly reproducable

Main issues:

  1. Your example try to operate on dict / JSON like string
  2. Your selection soup.findAll('option',{"value": 'US - Dow Jones Industrials Index'}) is looking for an attribute value that should be equal to US - Dow Jones Industrials Index but there is no one, it is the text of the tag.

In newer code avoid old syntax findAll() instead use find_all() - For more take a minute to check docs

Following should fix behavior, I used css selectors instead of find_all(), cause its more handy for me:

...
dropdown_text = s.post(base_url watchlist_url get_user_watch_url, data=data)

soup = bs(dropdown_text.json()['Html'])
soup.select('option:-soup-contains("US - Dow Jones Industrials Index")')[0].get('value')
...
Example
from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup({"Html":"\r\n\r\n\u003cinput name=\"__RequestVerificationToken\" type=\"hidden\" value=\"4kKsQ_RWnpHMezsKWCJpcnhN45rqz5wJ96JPRLcbA4r2x_Mpc8UlvP0SleR93TRexSqg1sYIBSLuIAkd1AfjQXM6q_lXjM3BkJXj9Hyn_OcKLvFCc3g25Fpv7pKrKM-3Mv1QpH7VzZRGzDLfm-gf4Q2\" /\u003e\r\n\r\n\u003cdiv class=\"inner-01\"\u003e\r\n\u003cform class=\"form form-01 try-001\" id=\"analysisForm\"\u003e\r\n    Select a Market from Your Watchlist:\r\n    \u003cdiv class=\"type\"\u003e\r\n        \u003cselect id=\"WatchlistItem\" name=\"WatchlistItem\"\u003e\u003coption value=\"\"\u003eSelect Market\u003c/option\u003e\r\n\u003coption selected=\"selected\" value=\"In Order\"\u003eShow All - In Order\u003c/option\u003e\r\n\u003coption value=\"A-Z\"\u003eShow All - Alphabetical A-Z\u003c/option\u003e\r\n\u003coption value=\"Z-A\"\u003eShow All - Alphabetical Z-A\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfe6\"\u003eUS - S\u0026amp;P 500 Index\u003c/option\u003e\r\n\u003coption value=\"5a2b420decfdc711085dc51d\"\u003eBitcoin Per USD\u003c/option\u003e\r\n\u003coption value=\"5653923ac7eb6e355878bfeb\"\u003eUS - Dow Jones Industrials Index\u003c/option\u003e"}['Html'])

soup.select('option:-soup-contains("US - Dow Jones Industrials Index")')[0].get('value')
Output
5653923ac7eb6e355878bfeb
  • Related