Home > database >  Scrape with a loop and output each table to a different sheet in the same workbook in python
Scrape with a loop and output each table to a different sheet in the same workbook in python

Time:08-02

I try to output the tables I scrape with this code in different sheets in the same workbook, and give them a different name but I can't make it work. I quite new to Python so I would really appreciate some help here. This is the part of the code that seems to work fine

import requests
from bs4 import BeautifulSoup as bs
from time import sleep

masterlist = []
i = 0

url = "https://cryptopunks.app/cryptopunks/details/"

for cryptopunk in range(0,10): # The range of cryptopunks
    row_data = []
    sleep(2) # sleep time of loop so it doesn't break
    page = requests.get(url   str(i)) #change the address for each punk
    soup = bs(page.text, 'lxml') 
    table_body = soup.find('table')    
    for row in table_body.find_all('tr'): #get the rows of the table
        col = row.find_all('td') #get the cells
        col = [ele.text.strip().encode("utf-8") for ele in col]
        row_data.append(col) #append all in the file 
    masterlist.append (row_data)
    i = i 1
    print: i
    df = pd.DataFrame(masterlist).T
    writer = pd.ExcelWriter('group1.xlsx', engine='xlsxwriter')
    df.to_excel(writer,index=False)
    writer.save()

But this is the part of code that I tried to use to output the tables but it doesn't work

    df = pd.DataFrame(masterlist).T
    writer = pd.ExcelWriter('group1.xlsx', engine='xlsxwriter')
    df.to_excel(writer,index=False)
    writer.save()

What I get with this code is the following: enter image description here

I would like the tables to have also the following column header:

header=['Type', 'From', 'To', 'Amount', 'Txn']

Thanks

CodePudding user response:

This is a way to write dataframes to multiple sheets in Excel.

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
from time import sleep

masterlist = []
url = "https://cryptopunks.app/cryptopunks/details/"
num_cryptopunks = 10
for i, cryptopunk in zip(range(num_cryptopunks), range(num_cryptopunks)): # The range of cryptopunks
    row_data = []
    sleep(2) # sleep time of loop so it doesn't break
    page = requests.get(url   str(i)) #change the address for each punk
    soup = bs(page.text, 'lxml') 
    table_body = soup.find('table')    
    for row in table_body.find_all('tr'): #get the rows of the table
        col = row.find_all('td') #get the cells
        col = [ele.text.strip().encode("utf-8") for ele in col]
        row_data.append(col) #append all in the file 

    df = pd.DataFrame(row_data)
    masterlist.append (df)

writer = pd.ExcelWriter('group1.xlsx'   )###, engine='xlsxwriter')
for cryptopunk, df in zip(range(num_cryptopunks), masterlist):
    df.to_excel(writer,sheet_name=str(cryptopunk),index=False, header = ['Type', 'From', 'To', 'Amount', 'Txn'])
writer.save()
  • Related