name | age | address | |
---|---|---|---|
1 | "Steve" | 27 | {"number": 4, "street": "Main Road", "city": "Oxford"} |
2 | "Adam" | 32 | {"number": 78, "street": "High St", "city": "Cambridge"} |
However the subdocuments will just appear as JSON inside the subdocument cell
from pandas import DataFrame
df = DataFrame(list(db.collection_name.find({}))
print(df)
how can I get a below 2nd table like this using python?
what is the approach after this?
name | age | address.number | address.street | address.city | |
---|---|---|---|---|---|
1 | Steve | 27 | 4 | "Main Road" | "Oxford" |
2 | Adam | 32 | 78 | "High St" | "Cambridge" |
CodePudding user response:
You can use pd.DataFrame
to expand the JSON/dict in column address
into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join()
, as follows:
Optional step: If your JSON/dict are actually strings, convert them to proper JSON/dict first. Otherwise, skip this step.
import ast
df['address'] = df['address'].map(ast.literal_eval)
Main codes:
import pandas as pd
df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))
Result:
name age address.number address.street address.city
1 Steve 27 4 Main Road Oxford
2 Adam 32 78 High St Cambridge
Alternatively, if you have only a few columns to add from the JSON/dict, you can also add them one by one, using the string accessor str[]
, as follows
df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']
Setup
import pandas as pd
data = {'name': {1: 'Steve', 2: 'Adam'},
'age': {1: 27, 2: 32},
'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)
CodePudding user response:
Depending on use case, it may make more sense to setup an aggregation pipeline and $project the necessary nested documents up to the top level:
df = pd.DataFrame(db.collection_name.aggregate([{
'$project': {
'_id': 0,
'name': '$name',
'age': '$age',
# Raise Sub-documents to top-level under new name
'address_number': '$address.number',
'address_street': '$address.street',
'address_city': '$address.city'
}
}]))
df
:
name age address_number address_street address_city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
Or if there are many too many fields to do manually we could also repalceRoot
and mergeObjects
:
df = pd.DataFrame(db.collection_name.aggregate([
{'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
{'$project': {'_id': 0, 'address': 0}}
]))
df
:
name age number street city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
collection_name
setup:
# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
'name': 'Steve', 'age': 27,
'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
'name': 'Adam', 'age': 32,
'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])