Home > Software engineering >  Transform pandas column with Json content
Transform pandas column with Json content

Time:04-19

I have a Pandas dataframe that has 3 columns: id external_id_client custom_attributes

id external_id_client custom_attributes
1 2530 [{'teste_id': 19225, 'custom_attribute_id': 47}]
2 4050 [{'teste_id': 1834, 'custom_attribute_id': 53}]
3 4953 [{'teste_id': 3295, 'custom_attribute_id': 8565}]

The problem is that the custom_attributes column is coming with a list and a json inside. I need to transform this column with the content in json into columns, with the key being the column name and content being the value.

This is the expected output, could you help me on how to do this in code?

id external_id_client teste_id custom_attribute_id
1 2530 19225 47
2 4050 1834 53
3 4953 3295 8565

CodePudding user response:

Assuming every element of the column custom_attributes contains a list with a dictionary with the same keys, you could simply iterate over the keys and successively add new columns for each key and then drop the redundant initial column, as follows:

df = pd.DataFrame([
    [1, 2530, [{'teste_id': 19225, 'custom_attribute_id': 47}]],
    [2, 4050, [{'teste_id': 1834, 'custom_attribute_id': 53}]],
    [3, 4953, [{'teste_id': 3295, 'custom_attribute_id': 8565}]]
], index=[1, 2, 3], columns=['id', 'external_id_client', 'custom_attributes'])

def transform(df):
    df2 = df.copy()
    keys = list(df['custom_attributes'].values[0][0].keys())
    for key in keys:
        df2[key] = df['custom_attributes'].apply(lambda x: x[0][key])
    df2.drop(columns=['custom_attributes'], inplace=True)
    return df2

df2 = transform(df)

CodePudding user response:

With apply you can extract the json column:

df[['teste_id', 'custom_attribute_id']] = df['custom_attributes'].apply(
lambda x: pd.Series(x[0].values()))

And finally drop the column:

df.drop('custom_attributes', axis=1, inplace=True)

CodePudding user response:

One option is with Pandas string methods:

 (df
.assign(teste_id = df.custom_attributes.str[0].str['teste_id'], 
        custom_attribute_id = df.custom_attributes.str[0].str['custom_attribute_id'])
.drop(columns = 'custom_attributes')
)

   id  external_id_client  teste_id  custom_attribute_id
1   1                2530     19225                   47
2   2                4050      1834                   53
3   3                4953      3295                 8565
  • Related