I'm sending a request to an API. This API is INSEE, it is the French official repositories of registered companies.
Here is my request:
headers = {
'Accept': 'application/json',
'Authorization': 'xxx'
}
params = (
('q', 'siren:530085802'),
('date', '2021-10-01'),
('champs', 'siret, denominationUniteLegale, codePostalEtablissement, libelleCommuneEtablissement, denominationUsuelleEtablissement'),
('debut', 1)
# ('nombre', 1)
)
response = requests.get('https://api.insee.fr/entreprises/sirene/V3/siret', headers=headers, params=params)
For this example, I'm requesting Facebook's legal entity by its number : siren:530085802
(this is Open Data, nothing here is confidential).
Now I get a response:
reponse.text
This :
'{"header":{"statut":200,"message":"OK","total":3,"debut":1,"nombre":2},"etablissements":[{"siret":"53008580200011","uniteLegale":{"denominationUniteLegale":"FACEBOOK FRANCE"},"adresseEtablissement":{"codePostalEtablissement":"75116","libelleCommuneEtablissement":"PARIS 16"},"periodesEtablissement":[{"dateFin":null,"dateDebut":"2012-06-14","denominationUsuelleEtablissement":null},{"dateFin":"2012-06-13","dateDebut":"2011-02-17","denominationUsuelleEtablissement":null},{"dateFin":"2011-02-16","dateDebut":"2011-01-01","denominationUsuelleEtablissement":null}]},{"siret":"53008580200037","uniteLegale":{"denominationUniteLegale":"FACEBOOK FRANCE"},"adresseEtablissement":{"codePostalEtablissement":"75002","libelleCommuneEtablissement":"PARIS 2"},"periodesEtablissement":[{"dateFin":null,"dateDebut":"2018-06-26","denominationUsuelleEtablissement":null},{"dateFin":"2018-06-25","dateDebut":"2016-04-18","denominationUsuelleEtablissement":null}]}]}'
As per the official documentation, there are multiple layers :
- Legal entity : Identified by a
siren
- Establishment : Identified by a
siret
– which is the SIREN 5 characters. Basically here I'm getting a list of establishments, since I'm requesting only one SIREN - Periods : Which is archived/change tracking of several attributes. With a
dateDebut
(starting date) and adateFin
(end date) and attributes, among which thedenominationUsuelleEtablissement
I'd like to have the current values.
Despite putting a date in my request, the API responded with several periods, not only the one that includes the date I'm providing. Hence adding a layer.
What I'm trying to do is to flatten the response and convert it to a dataframe for easier use.
What I've done is the following. Using json.normalize
:
import json
import pandas as pd
contenuReponse = json.loads(response.text)
etablissements = contenuReponse['etablissements']
df = pd.json_normalize(etablissements)
df
I get this:
siret | periodesEtablissements | uniteLegale.denominationUniteLegale | |
---|---|---|---|
0 | 53008580200011 | [{'dateFin': None, 'dateDebut': '2012-06-14', ... | FACEBOOK FRANCE |
1 | 53008580200037 | [{'dateFin': None, 'dateDebut': '2018-06-26', ... | FACEBOOK FRANCE |
json_normalize
ignores the layer below periodesEtablissement
.
Is there a way to flatten successfully the whole response. Or, use only the periodesEtablissement
relevant to my request. Meaning the one with no dateFin
?
CodePudding user response:
You could try using this function:
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
and do this:
flatten_nested_json_df(df)
which gives:
index siret uniteLegale.denominationUniteLegale \
0 0 53008580200011 FACEBOOK FRANCE
0 0 53008580200011 FACEBOOK FRANCE
0 0 53008580200011 FACEBOOK FRANCE
1 1 53008580200037 FACEBOOK FRANCE
1 1 53008580200037 FACEBOOK FRANCE
adresseEtablissement.codePostalEtablissement \
0 75116
0 75116
0 75116
1 75002
1 75002
adresseEtablissement.libelleCommuneEtablissement \
0 PARIS 16
0 PARIS 16
0 PARIS 16
1 PARIS 2
1 PARIS 2
periodesEtablissement.dateFin periodesEtablissement.dateDebut \
0 None 2012-06-14
0 2012-06-13 2011-02-17
0 2011-02-16 2011-01-01
1 None 2018-06-26
1 2018-06-25 2016-04-18
periodesEtablissement.denominationUsuelleEtablissement
0 None
0 None
0 None
1 None
1 None