How do I extract data in JSON in a Pandas column and structure it by stacking it?-CodePudding

Suppose I have the following DataFame:

df = pd.DataFrame(
    {
        'id':[112, 130, 200],
        'name':['Lara', 'Kasia', 'Rocco'],
        'level/income':[{1:400, 2:800}, {1:100}, {1:1000, 2:2000, 3:5000}]
    }
)

I would like this to be the end result:

I tried to do this on my own but I failed spectacularly. I was trying to recreate the original DataFrame by finding which id and name would appear multiple times but this proved too much.

len_need = []
id2 = []
name2 = []
level2 = []
income2 = []

for index,row in df.iterrows():
    len_need.append(len(row['level/income']))

I'm sure there's a better way to do this.

CodePudding user response：

try this:

(df.set_index(['id', 'name'])
 .squeeze()
 .apply(pd.Series)
 .rename_axis('level', axis=1)
 .stack()
 .reset_index(name='income'))

>>>
    id  name    level   income
0   112 Lara    1       400.0
1   112 Lara    2       800.0
2   130 Kasia   1       100.0
3   200 Rocco   1       1000.0
4   200 Rocco   2       2000.0
5   200 Rocco   3       5000.0

CodePudding user response：

Variation using json_normalize:

from pandas import DataFrame, json_normalize

df = DataFrame(
    {
        "id": [112, 130, 200],
        "name": ["Lara", "Kasia", "Rocco"],
        "level/income": [{1: 400, 2: 800}, {1: 100}, {1: 1000, 2: 2000, 3: 5000}],
    }
)

income_cols = list(range(1, 3   1))
id_cols = ["id", "name"]

final = (
    df["level/income"]
    .pipe(json_normalize)
    .assign(id=df["id"], name=df["name"])
    .melt(
        id_vars=id_cols, value_vars=income_cols, var_name="level", value_name="income"
    )
    .dropna()
)

print(final)
#     id   name level  income
# 0  112   Lara     0   400.0
# 1  130  Kasia     0   100.0
# 2  200  Rocco     0  1000.0
# 3  112   Lara     1   800.0
# 5  200  Rocco     1  2000.0
# 8  200  Rocco     2  5000.0