Home > Back-end >  Want to extract values which is in a webpage as dictionary format using python
Want to extract values which is in a webpage as dictionary format using python

Time:09-27

I want to scrap Name, Phone Number and email from the webpage but it seems like the entire detail is in dictionary which is inside a dictionary someone please correct me I am confused how I am going to extract these values in there particular column. Here is the code

import requests
from bs4 import BeautifulSoup
from csv import writer

url ='https://mainapi.dadeschools.net/api/v1/employees?limit=10&skip=0&sortDesc=false'
R = requests.get(url)

soup = BeautifulSoup(R.text, 'html.parser')
print(soup)
with open('school.csv', 'a', encoding='utf8', newline ='') as f:
thewriter = writer(f)
header = ['Name', 'Location', 'Phone Number', 'Email' ]
thewriter.writerow(header)
thewriter.writerow(soup)

CodePudding user response:

It do not need BeautifulSoup as mentioned simply requets the api and transform JSON via csv.DictWriter to CSV.

Example

import requests, csv

url ='https://mainapi.dadeschools.net/api/v1/employees?limit=10&skip=0&sortDesc=false'
data = requests.get(url).json()['items']
data
with open('my.csv', 'w', newline='') as output_file:
    dict_writer = csv.DictWriter(output_file, data[0].keys())
    dict_writer.writeheader()
    dict_writer.writerows(data)

EDIT

As mentioned by Barry the Platipus there is also one and more methods to go with pandas:

import pandas as pd

pd.json_normalize(
    pd.read_json('https://mainapi.dadeschools.net/api/v1/employees?limit=10&skip=0&sortDesc=false')['items']
).to_csv('my.csv', index=False)

or

pd.DataFrame(
    pd.read_json('https://mainapi.dadeschools.net/api/v1/employees?limit=10&skip=0&sortDesc=false')['items']\
    .values.tolist()
).to_csv('my.csv', index=False)
  • Related