I am getting information from an API using Python requests with the following code:
import json
import requests
resp = requests.get('https://api.oxoservices.eu/api/v1/startups?site=labs&startup_status=funded')
json_resp = json.loads(resp.text)
for company in json_resp['data']:
print(json.dumps(company, indent=4))
print()
with open("test.json", "w", encoding='utf-8') as file:
# file.write(str(json_resp))
json.dump(json_resp, file, indent=4, sort_keys=True)
It extracts all needed information, and a lot of not needed information as well, which is my problem.
I get the output:
"data": [
{
"cover": null,
"cover_id": null,
"created_at": "2021-01-05T05:56:03.000000Z",
"focus": {
"color": "#25c9b6",
"created_at": "2016-06-15T10:46:50.000000Z",
"id": 15,
"is_active": true,
"name": "Financial Technologies",
"updated_at": "2016-06-15T10:46:50.000000Z"
},
"focus_id": 15,
"id": 1111,
"irr": 0,
"is_active": false,
"name": "iconicchain",
"photo": {
"created_at": "2021-11-15T17:16:17.000000Z",
"filename": "iconicchain.png",
"id": "52b7c33f-c74c-4099-88cb-944b4047cf85",
"mime": "image/png",
"size": 14056,
"type": "photo",
"url": "/attachments/52b7c33f-c74c-4099-88cb-944b4047cf85"
},
"photo_id": "52b7c33f-c74c-4099-88cb-944b4047cf85",
"raised_type": {
"id": 3,
"key": "seed",
"name": "Seed"
},
"startup_investment_type": {
"id": 1,
"key": "none",
"name": "Not seeking"
},
"startup_stage_id": 4,
"startup_status": {
"id": 5,
"key": "funded",
"name": "Funded"
},
"startup_valuation_basis": {
"id": 3,
"key": "next_funding_round",
"name": "Next funding round"
},
"summary": "Compliance based on facts, not faith-delivering regulatory compliance automation solutions for the financial sector.",
"video_id": null,
"video_type_id": "1",
"website": "https://www.iconicchain.com"
From the data I would only like to extract the website, which in this case would be https://www.iconicchain.com, and only the name iconchain at the top.
CodePudding user response:
First of all, if you're using requests to pull data from a JSON api, you don't need to import json package as well, requests by itself will parse json just fine. You can do what you need with requests and pandas only:
import requests
import pandas as pd
r = requests.get('https://api.oxoservices.eu/api/v1/startups?site=labs&startup_status=funded')
df = pd.DataFrame(r.json()['data'])
df = df[['name', 'website']]
print(df)
This will return:
name website
0 iconicchain https://www.iconicchain.com
1 Gloster Nyrt. https://gloster.hu/
2 Vilhemp https://vilhemp.hu
3 HackRate https://hckrt.com/
4 Commsignia http://www.commsignia.com
5 BitNinja https://bitninja.io
[...]