Home > Blockchain >  How to scrape a table and convert it into CSV with Python?
How to scrape a table and convert it into CSV with Python?

Time:09-19

I'd like to gather some lecturer info and export it into CSV. I read some article and tutorial using python. The code looks like this

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup

driver = webdriver.Chrome(ChromeDriverManager().install())

url = "https://uin-suka.ac.id/id/page/detil_dosen/197606110000002301"

driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

df = pd.read_html(str(soup))[0]
print(df)

and the output look like this

                             0  1                                   2
0                         Nama  :     Dr. Nina Mariani Noor, SS., MA.
1                Program Studi  :   Interdisciplinary Islamic Studies
2                     Fakultas  :                        Pascasarjana
3       Jenis Pegawai | Status  :  Pegawai Tetap BLU | Aktif Mengajar
4  Jabatan Akademik | Golongan  :                      Lektor | III/C
5                        Email  :                                   -
6          Pendidikan Terakhir  :                                  S3

Process finished with exit code 0

the csv table I created look like this

table

The problem is, how can I pull the second column and put it into CSV as row?

CodePudding user response:

without using df and assuming input is dict

keys = res.keys()
with open("output.csv", "w", newline="") as f:
    dict_writer = csv.DictWriter(f, keys)
    dict_writer.writeheader()
    dict_writer.writerows(res) 

using df I think you need to transpose it first.

df = df.drop('1',axis=1)
df = df.set_index('col1').T
df.to_csv('output.csv')

CodePudding user response:

While using pandas try to use pandas.read_html() and pandas.transpose() / .T to get your goal without selenium overhead:

import pandas as pd

pd.read_html('https://uin-suka.ac.id/id/page/detil_dosen/197606110000002301')[0]\
    .set_index(0)\
    .T.drop(1)\
    .to_csv('myfile.csv', index=False)
Nama Program Studi Fakultas Jenis Pegawai | Status Jabatan Akademik | Golongan Email Pendidikan Terakhir
Dr. Nina Mariani Noor, SS., MA. Interdisciplinary Islamic Studies Pascasarjana Pegawai Tetap BLU | Aktif Mengajar Lektor | III/C - S3
  • Related