Python Pandas How to Assign JSON Values to a Pandas DF-CodePudding

I am requesting data from an API like this:

output = requests.get(url=url, auth=oauth, headers=headers, data=payload)
output_data_test = output.json()

output_data_test["customers"][0]["id"]
'1'

When I do this, the value does not get assigned:

combined_output_temp_df = output_data_test["customers"][0]["id"]

combined_output_temp_df
Empty DataFrame

What am I doing wrong?

Here is how I created my dataframe:

combined_output_temp_df = pd.DataFrame(
    columns = [
        "id",
        "first_name",
        "last_name",
        "middle_initial",
        "email",
        ### - Preferences
        "preference_email_invoices",
        "preference_print_invoices",
        "preference_exclude_from_insurance_auto_enroll_on",
        ###
        "username",
        "created_at",,
        "blocked_payments",
        ### - Phone Numbers
        "phone_number_id",
        "phone_number_primary",
        ### - Mailing Address
        "mailing_address_id",
        "mailing_address_address1",
        "mailing_address_address2",
        "mailing_address_city",
        "mailing_address_state",
        "mailing_address_latitude",
        ### - Addresses
        "address_id",
        "address_address1", 
        "address_address2",
        "address_city",
        "address_state",
        "address_invalid_data",
        "address_label"
        ###
        ]
    )

Here is roughly what the JSON looks like:

{
    'customers': [
        {
            'id': '1', 
            'first_name': 'James', 
            'last_name': 'Test', 
            'middle_initial': '', 
            'email': '[email protected]', 
            'preferences': {
                'email_invoices': False, 
                'print_invoices': False, 
                'exclude_from_insurance_auto_enroll_on': None
                }, 
                
                'username': jamestesting, 
                'created_at': '2021-03-11T13:00:00.404-05:00', 
                'blocked_payments': False, 
                'phone_numbers': [
                    {
                        'id': '234234asdf', 
                        'primary': True
                    }, 
                    {
                        'id': '8438c19a', 
                        'primary': False
                    }
                ], 'mailing_address': {
                    'id': '431fe0b2', 
                    'address1': '15777 Fake Blvd', 
                    'address2': 'Lot 196', 
                    'city': 'Testing', 
                    'state': 'TX', 
                    'latitude': None
                }, 'addresses': [
                    {
                        'id': '431fe0b2', 
                        'address1': '157 whatever', 
                        'address2': 'Lot 196', 
                        'city': 'Sacramento', 
                        'state': 'NY', 
                        'invalid_data': False, 
                        'label': 'Home'
                    }
                ]
          }
   ]

}

Some customers have multiple phone numbers, some have 0 phone numbers. Same with mailing address and other attributes not found here. When I tried to use explode, it gave me an error message

CodePudding user response：

You are not defining the column of the dataframe where you want that string to get added / replaced. There are two ways of doing it either pass it as an argument when you are creating the dataframe, or if you are getting the value after creating the dateframe you can assign it to the corresponding column.

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Following is the pseudocode:

import pandas as pd

test_var = '1' 
print(test_var)

new_df = pd.DataFrame( [test_var], dtype="string" )
print(new_df)

new_df['col'] = '2'
print(new_df)

For multiple columns you can do something along the following lines:


combined_output_temp_df = pd.DataFrame(
    columns = [
        "id",
        "first_name",
        "last_name",
        "middle_initial",
        "email"
        ]
    )

combined_output_temp_df = [ 'test_id', 'test_fn', 'test_ln', 'test_middle', 'email' ]

print(combined_output_temp_df);

CodePudding user response：

I am not giving you a complete answer but this could give you a starter.

First use json_normalize on top level to create a DataFrame.

df = pd.json_normalize(output_data_test, record_path=['customers'])

Then, if you need a single phone number or addresses per row,

df = df.explode('phone_numbers')
df = df.explode('addresses')

Then you can flatten the phone_numbers/addresses columns.

Or alternatively,

df = pd.json_normalize(output_data_test, record_path=['customers'])
df_phone = pd.json_normalize(output_data_test['customers'], record_path=['phone_numbers'], meta=['id'], record_prefix='phone_numbers_')
df = df.merge(df_phone, on='id', how='left')

Then do the similar thing with addresses.