Home > Software engineering >  How to scrape web-page with button/menuitems optionvalue?
How to scrape web-page with button/menuitems optionvalue?

Time:08-06

In particular, I'am trying to scrape this web site:" enter image description here

My Currently core is the follow:

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//[@class='btn btn-default dropdown-toggle']")))).select_by_visible_text('50')

where is my wrong? Can you help me ?

Thank you in advance for youre Time!

CodePudding user response:

You can try this easier code which doesn't need Selenium but rather directly makes a call to the data API of the site with requests.

Please note the argument limit at the end of the query string that sets the limit to 50 rows, as you want. If you want to scrape the next 50 items just increase the offset to 50 then 100, 150, etc. This will get you all the available data.

import requests
import pandas as pd

url = "https://whalewisdom.com/filer/holdings?id=berkshire-hathaway-inc&q1=-1&type_filter=1,2,3,4&symbol=&change_filter=&minimum_ranking=&minimum_shares=&is_etf=0&sc=true&sort=current_mv&order=desc&offset=0&limit=50"
raw = requests.get(url)

data = json.loads(raw.content)
df = pd.DataFrame(data["rows"])
df.head()

Print out:

    symbol  permalink   security_type   name    sector  industry    current_shares  previous_shares     shares_change   position_change_type    ...     percent_ownership   quarter_first_owned     quarter_id_owned    source_type     source_date     filing_date     avg_price   recent_price    quarter_end_price   id
0   AAPL    aapl    SH  Apple Inc   INFORMATION TECHNOLOGY  COMPUTERS & PERIPHERALS     8.909234e 08    8.871356e 08    3787856.0   addition    ...     5.5045625   Q1 2016     61  13F     2022-03-31  2022-05-16  36.6604     160.01  174.61  None
1   BAC     bac     SH  Bank of America Corp. (North Carolina National...   FINANCE     BANKS   1.010101e 09    1.010101e 09    0.0     None    ...     12.5371165  Q3 2017     67  13F     2022-03-31  2022-05-16  25.5185     33.04   41.22   None
2   AXP     axp     SH  American Express Co     FINANCE     CONSUMER FINANCE    1.516107e 08    1.516107e 08    0.0     None    ...     20.1326115  Q1 2001     1   13F     2022-03-31  2022-05-16  39.3110     151.60  187.00  None
3   CVX     cvx     SH  Chevron Corp. (Standard Oil of California)  ENERGY  INTEGRATED OIL & GAS    1.591781e 08    3.824504e 07    120933081.0     addition    ...     8.1014366   Q4 2020     80  13F     2022-03-31  2022-05-16  125.3424    159.14  162.83  None
4   KO  ko  SH  Coca Cola Co.   CONSUMER STAPLES

CodePudding user response:

You're trying to pass non-select node to Select class instance. This won't work

Try this code

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='btn btn-default dropdown-toggle']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "50"))).click()
  • Related