Home > Software design >  parse specific key in json to dataframe column
parse specific key in json to dataframe column

Time:01-31

How do you parse a specific key to multiple rows and column from a response?

I am getting a response:

response = requests.post('https://api.kewords.com/v1/get_keyword_data', data=my_data, headers=my_headers)

which returns:

{
"data": {
    "heart disease": {
        "vol": 18100,
        "cpc": {
            "currency": "$",
            "value": "2.01"
        },
        "keyword": "keyword planner",
        "competition": 0.21,
        "trend": [
            {
                "month": "January",
                "year": 2022,
                "value": 18100
            },
            {
                "month": "February",
                "year": 2022,

However, when I normalize it before I save it to a .csv, it places everything into a single row:

enter image description here

How do you parse a specific field, like data.heart disease.trend in my example, with separate columns for year and month based on value to a pandas dataframe? Every method from normalizing it to dealing with it as a dict has failed.

my code:

my_data = {
    'country': 'us',
    'currency': 'USD',
    'dataSource': 'cli',
    
    f'kw[{keyword}]': ["keywords tool", "keyword planner"]
}
my_headers = {
    'Accept': 'application/json',
    'Authorization': ''
}
response = requests.post('https://api.keywords.com/v1/get_keyword_data', data=my_data, headers=my_headers)

#write to json
with open('output.json', 'wb') as outf:
    outf.write(response.content)
    
if response.status_code == 200:
    print('success\n\n', response.content.decode('utf-8'))
else:
    print("An error occurred\n\n", response.content.decode('utf-8'))

CodePudding user response:

Using json_normalize:

data = json.loads(response.text)

df = pd.json_normalize(
    data=data["data"]["heart disease"],
    meta=["vol", "keyword", "competition", "cpc"],
    record_path="trend"
)
df = pd.concat([df.drop(["cpc"], axis=1), df["cpc"].apply(pd.Series)], axis=1)
print(df)

Output:

      month  year  value    vol          keyword competition currency value
0   January  2022  18100  18100  keyword planner        0.21        $  2.01
1  February  2022  20000  18100  keyword planner        0.21        $  2.01
  • Related