I am trying to make a scraper but it is giving me problems because the url is not showing them completely, instead it only shows ... and it does not allow me to scrape as it should. This is the code:
from bs4 import BeautifulSoup
import requests
import urllib.request
import pandas as pd
import numpy as np
url = input("Url a scrapear: ") #https://www.plasticosur.com/hostelería#/pageSize=36&viewMode=grid&orderBy=15&pageNumber=1
pagina = urllib.request.urlopen(url).read().decode()
elementos = BeautifulSoup(pagina)
productos = elementos.find_all('div', class_='picture')
for div in productos:
out_dicts = div.find('a')['href']
#d = out_dict
#df = pd.DataFrame((d), columns=['url'], index=['pagina'])
for out_dict in out_dicts:
#item = out_dicts
urlpagina = f'https://www.plasticosur.com{out_dicts}'
df = pd.DataFrame(urlpagina, index=['pagina'], columns=['url'])
df = df.drop_duplicates()
#datos = urllib.request.urlopen(urlpagina).read().decode()
#soup = BeautifulSoup(datos)
#titulo = soup('div')
print(df)
Any idea how to fix it?
CodePudding user response:
Just replace print(df)
with
print(df.url.values[0])