Home > Net >  iterate scraping throught URLS
iterate scraping throught URLS

Time:03-26

I have this code that im tryng to do but get error on invalid schema

#for index, row in df.iterrows():
  #  print(index,row["Data"])
for offset in (df.apply(lambda row: row["Data"]  , axis = 1)):

    response = requests.get(df["Data"])
    print('url:', response.url)
    

this is my dataframe that are a group of links per page (10 per page) and two index so they are 20 links. Data 0 [http://www.mercadopublico.cl/Procurement/Modu... 1 [http://www.mercadopublico.cl/Procurement/Modu...

I want to make this code run for every 10 links and scrape them and get the data, then go to next , but the data scraped will be on one set of information in a table.

but i cant make the response get the url inside of the data frame

i get this message

InvalidSchema: No connection adapters were found for '0    [http://www.mercadopublico.cl/Procurement/Modu...\n1    [http://www.mercadopublico.cl/Procurement/Modu...\nName: Data, dtype: object'

do you have a advice to this? best regards

I think that also would help me put both index in one fusing them, but not sure how to do it, searched a lot but coultdn't find how, some reference to np.array that I tried but didnt work.

CodePudding user response:

just to answer because i solved it , never store url as dataframe if you are scraping later, instead of make a dataframe resultsurl[] store it as list resultsurl=list()

and then iterate on list as for i in list() this case is calet resulturl..

thanks

  • Related