Home > Back-end >  Troubles w/ BeautifulSoup
Troubles w/ BeautifulSoup

Time:07-22

I used BeautifulSoup to scrap this web https://www.rava.com/perfil/CEDEARAAPL but recently they change the source code and I´m having troubles to find the marker for the download button shown on the image attached.

I´ve tried changing attributes and markers with soup.findall with no luck.

Any help ´ll be appreciated. Thanks,

import requests
from bs4 import BeautifulSoup

url='https://www.rava.com/perfil/CEDEARAAPL'

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

main_links = soup.findAll('div', attrs={'class':'download'})

print(main_links)
[]

download button web inspect

CodePudding user response:

Data is retrieved from an API source - you can find it in Network tab, when you click on that 'get excel' button. This is the way to retrieve the data and put it into a dataframe, which can be easily exported to csv, if that's what you want.

import requests
import pandas as pd

data = {'access_token':'5c51a8eef05c13657876d39fc3f8acbb197c57f3',
        'especie':'CEDEARAAPL',
        'fecha_inicio':'0000-00-00',
        'fecha_fin':'2022-07-21'}

r = requests.post('https://clasico.rava.com/lib/restapi/v3/publico/cotizaciones/historicos', data=data)

df = pd.DataFrame(r.json()['body'])
df.to_csv('the_data_needed.csv')
df

Response: 2598 rows × 8 columns

    especie fecha   apertura    maximo  minimo  cierre  volumen timestamp
0   CEDEARAAPL  2011-06-27  5.15357 5.15357 5.15357 5.15357 6720    1309143600
1   CEDEARAAPL  2011-06-29  5.11250 5.11250 5.11250 5.11250 35280   1309316400
2   CEDEARAAPL  2011-07-01  5.17857 5.17857 5.17857 5.17857 1680    1309489200
3   CEDEARAAPL  2011-07-21  5.84821 5.84821 5.82142 5.82142 38640   1311217200
4   CEDEARAAPL  2011-07-22  5.95535 5.95535 5.95535 5.95535 1400    1311303600
... ... ... ... ... ... ... ... .

CodePudding user response:

If access_token changes you can use next example to get it and load the data to pandas DataFrame:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.rava.com/perfil/CEDEARAAPL"
api_url = (
    "https://clasico.rava.com/lib/restapi/v3/publico/cotizaciones/historicos"
)

soup = BeautifulSoup(requests.get(url).content, "html.parser")
access_token = soup.find("navbar-c")[":access_token"].strip("'")

payload = {
    "access_token": access_token,
    "especie": "CEDEARAAPL",
    "fecha_inicio": "0000-00-00",
    "fecha_fin": "2022-07-22",
}

df = pd.DataFrame(requests.post(api_url, data=payload).json()["body"])
print(df.head(5).to_markdown(index=False))

Prints:

especie fecha apertura maximo minimo cierre volumen timestamp
CEDEARAAPL 2011-06-27 5.15357 5.15357 5.15357 5.15357 6720 1309143600
CEDEARAAPL 2011-06-29 5.1125 5.1125 5.1125 5.1125 35280 1309316400
CEDEARAAPL 2011-07-01 5.17857 5.17857 5.17857 5.17857 1680 1309489200
CEDEARAAPL 2011-07-21 5.84821 5.84821 5.82142 5.82142 38640 1311217200
CEDEARAAPL 2011-07-22 5.95535 5.95535 5.95535 5.95535 1400 1311303600
  • Related