I try to scrape siren information from the Insee database with a dynamic value in the url.
The status-code have to be 200 or 299. The result that I have, is None, None
.
import pandas as pd
import requests
def extract_siren_code(siren):
siren_recup, features = None, None
base_url = "https://api.insee.fr/entreprises/sirene/V3/siren/"
endpoint = f"{base_url}{siren}"
headers = {"Authorization": "Bearer <my bearer token>", "Accept": "application/json"}
response = requests.get(endpoint, headers=headers)
if response.status_code not in range(200, 299):
return None, None
try:
'''
This try block incase any of our inputs are invalid. This is done instead
of actually writing out handlers for all kinds of responses.
'''
results = response.json()['uniteLegale'][0]
print(results)
siren_recup = results['siren']
features = ['uniteLegale']
except:
pass
return siren_recup, features
siren_recup, features = extract_siren_code('824239214')
print(siren_recup, features)
CodePudding user response:
Here:
if response.status_code not in range(200, 299):
return None, None
status-code have to be 200 or 299. the result that I have, is None, None.
It's possible that your code is returning from here, due to some 3xx, 4xx or 5xx HTTP status code.
Check response.status_code
, for example with:
print(f"response.status_code: {response.status_code}")
.
Also, no need to post the bearer token, now you need to regenerate it.
CodePudding user response:
actually some companies have only one uniteLegale
and the API reply by a dict instead of a list of one dict, so you need to add a condition for this case:
import pandas as pd
import requests
def extract_siren_code(siren):
siren_recup, features = None, None
base_url = "https://api.insee.fr/entreprises/sirene/V3/siren/"
endpoint = f"{base_url}{siren}"
headers = {"Authorization": "Bearer <my bearer token>", "Accept": "application/json"}
response = requests.get(endpoint, headers=headers)
if response.status_code not in range(200, 299):
return None, None
try:
'''
This try block incase any of our inputs are invalid. This is done instead
of actually writing out handlers for all kinds of responses.
'''
unite_legale = response.json()['uniteLegale']
results = unite_legale[0] if isinstance(unite_legale, list) else unite_legale
siren_recup = results['siren']
features = ['uniteLegale']
except:
pass
return siren_recup, features