In particular, I'am trying to scrape this web site:"
My Currently core is the follow:
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//[@class='btn btn-default dropdown-toggle']")))).select_by_visible_text('50')
where is my wrong? Can you help me ?
Thank you in advance for youre Time!
CodePudding user response:
You can try this easier code which doesn't need Selenium but rather directly makes a call to the data API of the site with requests
.
Please note the argument limit
at the end of the query string that sets the limit to 50 rows, as you want. If you want to scrape the next 50 items just increase the offset to 50 then 100, 150, etc. This will get you all the available data.
import requests
import pandas as pd
url = "https://whalewisdom.com/filer/holdings?id=berkshire-hathaway-inc&q1=-1&type_filter=1,2,3,4&symbol=&change_filter=&minimum_ranking=&minimum_shares=&is_etf=0&sc=true&sort=current_mv&order=desc&offset=0&limit=50"
raw = requests.get(url)
data = json.loads(raw.content)
df = pd.DataFrame(data["rows"])
df.head()
Print out:
symbol permalink security_type name sector industry current_shares previous_shares shares_change position_change_type ... percent_ownership quarter_first_owned quarter_id_owned source_type source_date filing_date avg_price recent_price quarter_end_price id
0 AAPL aapl SH Apple Inc INFORMATION TECHNOLOGY COMPUTERS & PERIPHERALS 8.909234e 08 8.871356e 08 3787856.0 addition ... 5.5045625 Q1 2016 61 13F 2022-03-31 2022-05-16 36.6604 160.01 174.61 None
1 BAC bac SH Bank of America Corp. (North Carolina National... FINANCE BANKS 1.010101e 09 1.010101e 09 0.0 None ... 12.5371165 Q3 2017 67 13F 2022-03-31 2022-05-16 25.5185 33.04 41.22 None
2 AXP axp SH American Express Co FINANCE CONSUMER FINANCE 1.516107e 08 1.516107e 08 0.0 None ... 20.1326115 Q1 2001 1 13F 2022-03-31 2022-05-16 39.3110 151.60 187.00 None
3 CVX cvx SH Chevron Corp. (Standard Oil of California) ENERGY INTEGRATED OIL & GAS 1.591781e 08 3.824504e 07 120933081.0 addition ... 8.1014366 Q4 2020 80 13F 2022-03-31 2022-05-16 125.3424 159.14 162.83 None
4 KO ko SH Coca Cola Co. CONSUMER STAPLES
CodePudding user response:
You're trying to pass non-select node to Select
class instance. This won't work
Try this code
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='btn btn-default dropdown-toggle']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "50"))).click()