I have a pandas dataframe like the example below:
id isDeleted properties
1 False {'taxa': {'value': '1,49%', 'timestamp': 1627418661971, 'source': 'IMPORT', 'sourceId': '13212434', 'updatedByUserId': 4875, 'versions': [{'name': 'taxa', 'value': '1,49%', 'timestamp': 1626565659525, 'sourceId': '151951296', 'source': 'IMPORT', 'sourceVid': [], 'sourceMetadata': '44524858245==', 'updatedByUserId': 12847957}]}, 'num_associated': {'value': '1', 'timestamp': 1619471103959, 'source': 'CALCULATED', 'sourceId': 'RollupProperties', 'updatedByUserId': None, 'versions': [{'name': 'num_associated', 'value': '1', 'timestamp': 1619471103959, 'sourceId': 'RollupProperties', 'source': 'CALCULATED', 'sourceVid': []}]}
2 False {'num_associated': {'value': '1', 'timestamp': 1619471103670, 'source': 'CALCULATED', 'sourceId': 'RollupProperties', 'updatedByUserId': None, 'versions': [{'name': 'num_associated', 'value': '1', 'timestamp': 1619471103670, 'sourceId': 'RollupProperties', 'source': 'CALCULATED', 'sourceVid': []}]}, 'time_available': {'value': '12313', 'timestamp': 1621361432266, 'source': 'BATCH_UPDATE', 'sourceId': 'userId:12847957', 'updatedByUserId': 1200111, 'versions': [{'name': 'time_available', 'value': '12313', 'timestamp': 1655565656598, 'sourceId': 'userId:1200111', 'source': 'BATCH_UPDATE', 'sourceVid': [], 'requestId': '9c86-4ae8-14185858', 'updatedByUserId': 1200111}]}, 'createdate': {'value': '1612285284796', 'timestamp': 1612285284796, 'source': 'IMPORT', 'sourceId': None, 'updatedByUserId': None, 'versions': [{'name': 'createdate', 'value': '1612285284796', 'timestamp': 1612285284796, 'source': 'IMPORT', 'sourceVid': []}]}
How can I transform the column properties to have just the key and value like bellow:
id isDeleted properties
1 False {"taxa": "1,49%", "num_associated": "1"}"
2 False {"num_associated":"3", "time_available": "12313", "createdate": "1612285284796"}
The json keys and values will always be dynamic.
CodePudding user response:
You can try
df['properties'] = df['properties'].apply(lambda d: {k:v.get('value') for k, v in d.items()})
print(df)
properties
0 {'taxa': '1,49%', 'num_associated': '1'}
1 {'num_associated': '1', 'time_available': '12313', 'createdate': '1612285284796'}