I used BeautifulSoup to scrap this web https://www.rava.com/perfil/CEDEARAAPL but recently they change the source code and I´m having troubles to find the marker for the download button shown on the image attached.
I´ve tried changing attributes and markers with soup.findall with no luck.
Any help ´ll be appreciated. Thanks,
import requests
from bs4 import BeautifulSoup
url='https://www.rava.com/perfil/CEDEARAAPL'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
main_links = soup.findAll('div', attrs={'class':'download'})
print(main_links)
[]
CodePudding user response:
Data is retrieved from an API source - you can find it in Network tab, when you click on that 'get excel' button. This is the way to retrieve the data and put it into a dataframe, which can be easily exported to csv, if that's what you want.
import requests
import pandas as pd
data = {'access_token':'5c51a8eef05c13657876d39fc3f8acbb197c57f3',
'especie':'CEDEARAAPL',
'fecha_inicio':'0000-00-00',
'fecha_fin':'2022-07-21'}
r = requests.post('https://clasico.rava.com/lib/restapi/v3/publico/cotizaciones/historicos', data=data)
df = pd.DataFrame(r.json()['body'])
df.to_csv('the_data_needed.csv')
df
Response: 2598 rows × 8 columns
especie fecha apertura maximo minimo cierre volumen timestamp
0 CEDEARAAPL 2011-06-27 5.15357 5.15357 5.15357 5.15357 6720 1309143600
1 CEDEARAAPL 2011-06-29 5.11250 5.11250 5.11250 5.11250 35280 1309316400
2 CEDEARAAPL 2011-07-01 5.17857 5.17857 5.17857 5.17857 1680 1309489200
3 CEDEARAAPL 2011-07-21 5.84821 5.84821 5.82142 5.82142 38640 1311217200
4 CEDEARAAPL 2011-07-22 5.95535 5.95535 5.95535 5.95535 1400 1311303600
... ... ... ... ... ... ... ... .
CodePudding user response:
If access_token
changes you can use next example to get it and load the data to pandas DataFrame:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.rava.com/perfil/CEDEARAAPL"
api_url = (
"https://clasico.rava.com/lib/restapi/v3/publico/cotizaciones/historicos"
)
soup = BeautifulSoup(requests.get(url).content, "html.parser")
access_token = soup.find("navbar-c")[":access_token"].strip("'")
payload = {
"access_token": access_token,
"especie": "CEDEARAAPL",
"fecha_inicio": "0000-00-00",
"fecha_fin": "2022-07-22",
}
df = pd.DataFrame(requests.post(api_url, data=payload).json()["body"])
print(df.head(5).to_markdown(index=False))
Prints:
especie | fecha | apertura | maximo | minimo | cierre | volumen | timestamp |
---|---|---|---|---|---|---|---|
CEDEARAAPL | 2011-06-27 | 5.15357 | 5.15357 | 5.15357 | 5.15357 | 6720 | 1309143600 |
CEDEARAAPL | 2011-06-29 | 5.1125 | 5.1125 | 5.1125 | 5.1125 | 35280 | 1309316400 |
CEDEARAAPL | 2011-07-01 | 5.17857 | 5.17857 | 5.17857 | 5.17857 | 1680 | 1309489200 |
CEDEARAAPL | 2011-07-21 | 5.84821 | 5.84821 | 5.82142 | 5.82142 | 38640 | 1311217200 |
CEDEARAAPL | 2011-07-22 | 5.95535 | 5.95535 | 5.95535 | 5.95535 | 1400 | 1311303600 |