Home > Back-end >  looping through list of json arrays within pandas dataframe column
looping through list of json arrays within pandas dataframe column

Time:03-04

I am trying to extract all of the 'uid' values into a simple python list for each row of the 'data' column and have been unsuccessful.

Here is the code so far:

import json

df = pd.DataFrame({
    'bank_account': [101, 102, 201, 301],
    'data': [
        '[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
        '[{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}]',
        '[{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}]',        
        '[{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}]'        
    ]},
    index = ['Alice', 'Alice', 'Bob', 'Carol']
)

# df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])

df["data"] = [[json.loads(d)['uid'] for d in li] for li in df['data']]
print(df)

CodePudding user response:

You could iterate over the list created by json.loads and get "uid" values:

import json
df['data'] = [[item['uid'] for item in json.loads(d)] for d in df['data']]

Output:

       bank_account        data
Alice           101  [100, 150]
Alice           102       [100]
Bob             201       [200]
Carol           301       [300]

You could also explode "data":

df = df.explode('data')

Output:

       bank_account data
Alice           101  100
Alice           101  150
Alice           102  100
Bob             201  200
Carol           301  300

CodePudding user response:

Modified apply a bit as you're using apply

import pandas as pd

df = pd.DataFrame(
    {
        "bank_account": [101, 102, 201, 301],
        "data": [
            '[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
            '[{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}]',
            '[{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}]',
            '[{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}]',
        ],
    },
    index=["Alice", "Alice", "Bob", "Carol"],
)
df["uid"] = df["data"].apply(lambda x: eval(x)[0]["uid"])
print(df)

Then for list you can do print(list(df["uid"]))

  • Related