I'm trying to scrape this site URL: https://statusinvest.com.br/fundos-imobiliarios/urpr11 to get from a table the dividends info from this specific REIT (I'll later generalize this). This is the table that contains the info:
I was able to get the dates and values from the table, but only for the first page. When I change the table page there's no modification in the website URL so I actually don't know how to deal with this. Any help would be appreciated.
Obs: It would be nice if the way to solve doesn't depende on the amount of pages because some REITs can have more than 2 pages of info.
This is how I'm currently taking the info from the first page
from bs4 import BeautifulSoup
import requests
URL = "https://statusinvest.com.br/fundos-imobiliarios/urpr11"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all("tr", class_="")
rows = []
for r in test:
if not r.find("td", title="Rendimento"):
continue
row = []
for child in r.findChildren():
if child.text.lower()=="rendimento":
continue
print(child.text)
row.append(child.text)
rows.append(row)
CodePudding user response:
Content is provided dynamically by JavaScript
, what requests
per se is not rendering, so you wont get all the data that way.
How to fix?
You could use selenium
to interact with the website like humans would do it in the browser - Something for later and more complicated issues.
But in this case it is much more simple and do not need selenium
. Just grab the JSON
data JavaScript
is using to provide the table:
data = json.loads(soup.select_one('#results')['value'])
Convert it into DataFrame
adjust for your needs and save it to csv,json, ....
pd.DataFrame(data).to_csv('yourFile.csv', index=False)
There are more columns as displayed on the website, take a look at the output of the example. These adjustments will give you the expected ones by only reading specific data and renaming column headers:
df = pd.DataFrame(data, columns=['et','ed', 'pd', 'v'])
df.columns = ['TIPO','DATA COM','PAGAMENTO','VALOR']
df.to_csv('yourFile.csv', index=False)
TIPO | DATA COM | PAGAMENTO | VALOR |
---|---|---|---|
Rendimento | 25/02/2022 | 15/03/2022 | 1.635 |
Rendimento | 31/01/2022 | 14/02/2022 | 1.63 |
Example
from bs4 import BeautifulSoup
import requests, json
import pandas as pd
URL = "https://statusinvest.com.br/fundos-imobiliarios/urpr11"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
data = json.loads(soup.select_one('#results')['value'])
pd.DataFrame(data)
#or with adjustment as mentioned above
#df = pd.DataFrame(data, columns=['et','ed', 'pd', 'v'])
#df.columns = ['TIPO','DATA COM','PAGAMENTO','VALOR']
#df.to_csv('yourFile.csv', index=False)
Output
y | m | d | ad | ed | pd | et | etd | v | ov | sv | sov | adj |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 25/02/2022 | 15/03/2022 | Rendimento | Rendimento | 1.635 | 1,63500000 | - | False | ||
0 | 0 | 0 | 31/01/2022 | 14/02/2022 | Rendimento | Rendimento | 1.63 | 1,63000000 | - | False | ||
0 | 0 | 0 | 30/12/2021 | 14/01/2022 | Rendimento | Rendimento | 1.67 | 1,67000000 | - | False | ||
0 | 0 | 0 | 30/11/2021 | 14/12/2021 | Rendimento | Rendimento | 1.869 | 1,86900000 | - | False | ||
0 | 0 | 0 | 29/10/2021 | 16/11/2021 | Rendimento | Rendimento | 1.37 | 1,37000000 | - | False | ||
0 | 0 | 0 | 30/09/2021 | 15/10/2021 | Rendimento | Rendimento | 2.17 | 2,17000000 | - | False | ||
0 | 0 | 0 | 31/08/2021 | 15/09/2021 | Rendimento | Rendimento | 2.01 | 2,01000000 | - | False | ||
0 | 0 | 0 | 30/07/2021 | 13/08/2021 | Rendimento | Rendimento | 1.48 | 1,48000000 | - | False | ||
0 | 0 | 0 | 30/06/2021 | 14/07/2021 | Rendimento | Rendimento | 2.4 | 2,40000000 | - | False | ||
0 | 0 | 0 | 31/05/2021 | 15/06/2021 | Rendimento | Rendimento | 2.06 | 2,06000000 | - | False | ||
0 | 0 | 0 | 30/04/2021 | 14/05/2021 | Rendimento | Rendimento | 1.185 | 1,18500000 | - | False | ||
0 | 0 | 0 | 31/03/2021 | 15/04/2021 | Rendimento | Rendimento | 2.87 | 2,87000000 | - | False | ||
0 | 0 | 0 | 26/02/2021 | 12/03/2021 | Rendimento | Rendimento | 2.09 | 2,09000000 | - | False | ||
0 | 0 | 0 | 29/01/2021 | 12/02/2021 | Rendimento | Rendimento | 2.25 | 2,25000000 | - | False | ||
0 | 0 | 0 | 30/12/2020 | 15/01/2021 | Rendimento | Rendimento | 2.01 | 2,01000000 | - | False | ||
0 | 0 | 0 | 30/11/2020 | 14/12/2020 | Rendimento | Rendimento | 2.03668 | 2,03668260 | - | False | ||
0 | 0 | 0 | 30/10/2020 | 13/11/2020 | Rendimento | Rendimento | 3.24 | 3,24000000 | - | False | ||
0 | 0 | 0 | 30/09/2020 | 15/10/2020 | Rendimento | Rendimento | 2.15 | 2,15000000 | - | False | ||
0 | 0 | 0 | 31/08/2020 | 15/09/2020 | Rendimento | Rendimento | 1.35 | 1,35000000 | - | False | ||
0 | 0 | 0 | 31/07/2020 | 14/08/2020 | Rendimento | Rendimento | 0.814098 | 0,81409811 | - | False | ||
0 | 0 | 0 | 30/06/2020 | 15/07/2020 | Rendimento | Rendimento | 1.56063 | 1,56063128 | - | False | ||
0 | 0 | 0 | 29/05/2020 | 15/06/2020 | Rendimento | Rendimento | 0.778074 | 0,77807445 | - | False | ||
0 | 0 | 0 | 30/04/2020 | 11/05/2020 | Rendimento | Rendimento | 0.615445 | 0,61544523 | - | False | ||
0 | 0 | 0 | 14/04/2020 | 15/04/2020 | Rendimento | Rendimento | 0.189474 | 0,18947368 | - | False |