I'd like to gather some lecturer info and export it into CSV. I read some article and tutorial using python
. The code looks like this
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://uin-suka.ac.id/id/page/detil_dosen/197606110000002301"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(soup))[0]
print(df)
and the output look like this
0 1 2
0 Nama : Dr. Nina Mariani Noor, SS., MA.
1 Program Studi : Interdisciplinary Islamic Studies
2 Fakultas : Pascasarjana
3 Jenis Pegawai | Status : Pegawai Tetap BLU | Aktif Mengajar
4 Jabatan Akademik | Golongan : Lektor | III/C
5 Email : -
6 Pendidikan Terakhir : S3
Process finished with exit code 0
the csv table I created look like this
The problem is, how can I pull the second column and put it into CSV as row?
CodePudding user response:
without using df and assuming input is dict
keys = res.keys()
with open("output.csv", "w", newline="") as f:
dict_writer = csv.DictWriter(f, keys)
dict_writer.writeheader()
dict_writer.writerows(res)
using df I think you need to transpose it first.
df = df.drop('1',axis=1)
df = df.set_index('col1').T
df.to_csv('output.csv')
CodePudding user response:
While using pandas
try to use pandas.read_html()
and pandas.transpose()
/ .T
to get your goal without selenium
overhead:
import pandas as pd
pd.read_html('https://uin-suka.ac.id/id/page/detil_dosen/197606110000002301')[0]\
.set_index(0)\
.T.drop(1)\
.to_csv('myfile.csv', index=False)
Nama | Program Studi | Fakultas | Jenis Pegawai | Status | Jabatan Akademik | Golongan | Pendidikan Terakhir | |
---|---|---|---|---|---|---|
Dr. Nina Mariani Noor, SS., MA. | Interdisciplinary Islamic Studies | Pascasarjana | Pegawai Tetap BLU | Aktif Mengajar | Lektor | III/C | - | S3 |