I have a graphql query that returns a string of JSON-formatted data, with 3 separate JSON objects inside. It looks like this:
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": False
}
]
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
What I'm trying to do is
- Merge the first two objects (
readingsList
andmessagesList
) based on the time (millis
anddateMillis
) into a dataframe - Convert that time into a UTC datetime value (e.g. 1651449224000 becomes 2022-05-01 18:53:44)
- Convert the UTC datetime value into local time for the user based on the users Timezone from
userList
Desired output:
df.head(3)
datetime value text type field1 ...
2022-05-01 18:53:44 137 NA NA NA
2022-05-01 18:48:44 141 NA NA NA
2022-05-01 17:49:06 138 foo bar False
I can do steps 2 and 3 but I don't know how to do step 1.
If I convert the string using json.loads()
and pd.read_json()
I get the following output:
import json
import pandas as pd
json_str = load_data_gql(...)
j = json.loads(json_str)
df = pd.read_json(j)
df.head()
data
groupsList [{'userTimezone': 'America/Los_Angeles'}]
messagesList [{'value': 138, 'dateMillis': 1651445346000, ...
readingsList [{'value': 137, 'millis': 1651449224000}, {'value'...
I now suspect that the answer has to somehow do with json_normalize() but I'm having difficulty applying what I read in that documentation to navigate my JSON objects properly.
Any advice or help would be greatly appreciated, thank you so much in advance.
CodePudding user response:
Proposed Solution:
Merging the dataframes in this case can be done with pandas.concat([df_1,df_2])
Here's the code I used:
import json
import pandas as pd
json_obj = json.load(open('json_str_file.json', 'r')) # if reading from file
# json_obj = json.loads(json_str) # if reading from a string
# create two separate frames from each nested dictionary object
df_1 = pd.DataFrame.from_dict(json_obj['data']['messagesList'])
df_2 = pd.DataFrame.from_dict(json_obj['data']['readingsList'])
# set the index to the column you want to merge them on
df_1.set_index('dateMillis', inplace=True)
df_2.set_index('millis', inplace=True)
# use pd.concat to stack the dataframes together
df_merged = pd.concat([df_1,df_2])
# fix field1 to be a boolean field
df_merged['field1'] = df_merged['field1'].astype(bool)
# confirm the result matches the target
print(df_merged)
Output
value text type field1
1651445346000 138 foo bar False
1651449224000 137 NaN NaN True
1651448924000 141 NaN NaN True
From here you should be able to do steps 2 and 3 from your post.
Issues with the JSON
The example you gave had some formatting issues that might cause some confusion. messagesList
and readingsList
needed to be separated by a ',' for me. Also json.load()
didn't like the value of False
in my example.
Here is the reformatted JSON
{
"data": {
"readingsList": [
{
"value": 137,
"millis": 1651449224000
},
{
"value": 141,
"millis": 1651448924000
}
],
"messagesList": [
{
"value": 138,
"dateMillis": 1651445346000,
"text": "foo",
"type": "bar",
"field1": 0
}
],
"userList": [
{
"userTimezone": "America/Los_Angeles"
}
]
}
}
Potential Confusion:
- JSON string could be formatted poorly
json.loads()
returns an object of typedict
with nested elements.pd.read_json()
expects a object of typestr
- using
pd.DataFrame.from_dict()
works withdict
objects and allows you to address the nested components like this:j['data']['messagesList']