Home > Software design >  ajax search and scraping prices of bus page // maybe selenium?
ajax search and scraping prices of bus page // maybe selenium?

Time:03-31

I am trying to get prices of routes on a bus page import requests from bs4 import BeautifulSoup import re

popup_linkz= list()
p=range(1, 2, 1)
for i in p:
                 
               
                    
                   
                
                def get_headers(session):
                    res = session.get("https://new.turbus.cl/turbuscl/inicio-compra")
                    if res.status_code == 200:
                        print("Got headers")
                        return res.text
                        
                    else:
                        print("Failed to get headers")
                        
                
                
                def search(session):
                    data = {
                            'origenInputModal': 'Santiago',
                            'destinoInputModal':'Calama',
                            'fechaRegreso': '03-04-2021',
                            'fechaIda': '31-03-2021',
                            
                        }
                    
                    res = session.post(
                        "https://new.turbus.cl/turbuscl/seleccion-itinerario", 
                        data=data) #not sure if this is the search link
                    if res.status_code == 200:
                        print("Search succeeded")
                        return res.text
                    else:
                        print("Search failed with error:", res.reason)
                    print(res.text)    
                 
                def get_popup_link(html):
                    soup = BeautifulSoup(html, "html.parser")
                    
                    
                    for t in soup.find_all('div', {'class': 'ticket_price-value'}):
                        precio = t.find('[class$="ticket_price-value"]').text
                        #cantidad = t.select_one('[id$="lblCantidad"]').text
                        #descripction = t.select_one('[id$="lblDescripcion"]').text
                        print(f"{precio=} {precio=}")
                    
                        #print()                
                        return precio
           
                def main():
                    with requests.Session() as s:
                        get_headers(s)
                        html = search(s)
                        popup_links = (get_popup_link(html))
                        print(popup_links)
                       # popup_linkz.extend(popup_links)
                        #print(popup_links)
                        #print(popup_linkz)
                        #download_html = get_download_html(s, popup_links)
                        # print(download_html)
                        #popup_linkz.extend(popup_links for i in range(0, 1, 1))
                main()
                
#a = popup_linkz
#print(a)

    enter code here

this is the link https://new.turbus.cl/turbuscl/inicio-compra

So right now I am able to find the input boxes of the search, but not sure were to run it.

I am getting this error ValueError: too many values to unpack (expected 2)

so i am not so sure of what i am failing.

would you try to enlight me in order to succeed?

I have been trying all die and get a new approach with selenium in order to get search....

is right what i am doing or was better my first approach?

-- coding: utf-8 --

""" Created on Tue Mar 29 21:04:05 2022

@author: christian marcos """

# -*- coding: utf-8 -*-
"""
Created on Tue Mar 29 16:20:40 2022

@author: christian marcos
"""

from selenium import webdriver as wd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from pandas.io.html import read_html
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys



#select and fill firs field origin
driver=wd.Chrome('C:\\chromedriver.exe')
driver.maximize_window()
driver.get('https://new.turbus.cl/turbuscl/inicio-compra')

driver.implicitly_wait(20)
driver.find_element_by_xpath('//*[@id="origen"]').click();
wait = WebDriverWait(driver, 30)

#select and fill firs field 

driver.implicitly_wait(10)
driver.find_element_by_xpath('//*[@id="modalOriginCity"]/div/div/div[2]/div[2]/ul/li[1]').click();

Best regards,

CodePudding user response:

The post data needed is different. In this case, you need:

{
  "fechaSalidaTramo": "31/03/2022",
  "mnemotecnicoCiudadOrigenTramo": "stgo",
  "mnemotecnicoCiudadDestinoTramo": "aric",
  "horaSalidaTramo": 0,
  "horaSalidaTramoMaxima": 0,
  "codigoLinea": 90,
  "numeroViaje": 0,
  "numeroCuentaCorrienteCliente": 0,
  "codigoIdaRegreso": 1,
  "cantidadAsientos": 1,
  "numeroRegistros": 0
}

And the link is, https://new.turbus.cl/turbuscl/recursos/vtwst76/web1.

In python, it'll look like this:

import requests

HOST = "https://nclt.gov.in/"

LINK = "https://new.turbus.cl/turbuscl/recursos/vtwst76/web1"

DATA = '{"fechaSalidaTramo":"31/03/2022","mnemotecnicoCiudadOrigenTramo":"stgo","mnemotecnicoCiudadDestinoTramo":"aric","horaSalidaTramo":0,"horaSalidaTramoMaxima":0,"codigoLinea":90,"numeroViaje":0,"numeroCuentaCorrienteCliente":0,"codigoIdaRegreso":1,"cantidadAsientos":1,"numeroRegistros":0}'


HEADERS = {
    "Content-Type": "application/json",
}

def get_route(origin, destination):
    res = requests.post(LINK, data=DATA, headers=HEADERS)
    if res.status_code == 200:
        print("getting routes")
        return res.json()
    else:
        print(res)


def main():
    info = get_route("here", "there")
    print(info)

if __name__ == "__main__":
    main()

How I got to the answer:

  1. Go to the site.
  2. Open the network tab, so I can see requests.
  3. Do a search, and find the request that matches.
  4. Copy the request as a curl request and import it into postman.
  5. Remove headers, and see if you get an error when you do a request. Repeat until you have only the needed headers.
  6. Copy the needed headers and data, and test it using requests.
  • Related