I have a Pandas dataframe that has 3 columns:
id
external_id_client
custom_attributes
id | external_id_client | custom_attributes |
---|---|---|
1 | 2530 | [{'teste_id': 19225, 'custom_attribute_id': 47}] |
2 | 4050 | [{'teste_id': 1834, 'custom_attribute_id': 53}] |
3 | 4953 | [{'teste_id': 3295, 'custom_attribute_id': 8565}] |
The problem is that the custom_attributes
column is coming with a list and a json inside. I need to transform this column with the content in json into columns, with the key being the column name and content being the value.
This is the expected output, could you help me on how to do this in code?
id | external_id_client | teste_id | custom_attribute_id |
---|---|---|---|
1 | 2530 | 19225 | 47 |
2 | 4050 | 1834 | 53 |
3 | 4953 | 3295 | 8565 |
CodePudding user response:
Assuming every element of the column custom_attributes contains a list with a dictionary with the same keys, you could simply iterate over the keys and successively add new columns for each key and then drop the redundant initial column, as follows:
df = pd.DataFrame([
[1, 2530, [{'teste_id': 19225, 'custom_attribute_id': 47}]],
[2, 4050, [{'teste_id': 1834, 'custom_attribute_id': 53}]],
[3, 4953, [{'teste_id': 3295, 'custom_attribute_id': 8565}]]
], index=[1, 2, 3], columns=['id', 'external_id_client', 'custom_attributes'])
def transform(df):
df2 = df.copy()
keys = list(df['custom_attributes'].values[0][0].keys())
for key in keys:
df2[key] = df['custom_attributes'].apply(lambda x: x[0][key])
df2.drop(columns=['custom_attributes'], inplace=True)
return df2
df2 = transform(df)
CodePudding user response:
With apply
you can extract the json column:
df[['teste_id', 'custom_attribute_id']] = df['custom_attributes'].apply(
lambda x: pd.Series(x[0].values()))
And finally drop
the column:
df.drop('custom_attributes', axis=1, inplace=True)
CodePudding user response:
One option is with Pandas string methods:
(df
.assign(teste_id = df.custom_attributes.str[0].str['teste_id'],
custom_attribute_id = df.custom_attributes.str[0].str['custom_attribute_id'])
.drop(columns = 'custom_attributes')
)
id external_id_client teste_id custom_attribute_id
1 1 2530 19225 47
2 2 4050 1834 53
3 3 4953 3295 8565