Here is the type of json file that I am working with
{
"header": {
"gtfsRealtimeVersion": "1.0",
"incrementality": "FULL_DATASET",
"timestamp": "1656447045"
},
"entity": [{
"id": "RTVP:T:16763243",
"isDeleted": false,
"vehicle": {
"trip": {
"tripId": "16763243",
"scheduleRelationship": "SCHEDULED"
},
"position": {
"latitude": 33.497833,
"longitude": -112.07365,
"bearing": 0.0,
"odometer": 16512.0,
"speed": 1.78816
},
"currentStopSequence": 16,
"currentStatus": "INCOMING_AT",
"timestamp": "1656447033",
"stopId": "2792",
"vehicle": {
"id": "5074"
}
}
}, {
"id": "RTVP:T:16763242",
"isDeleted": false,
"vehicle": {
"trip": {
"tripId": "16763242",
"scheduleRelationship": "SCHEDULED"
},
"position": {
"latitude": 33.562374,
"longitude": -112.07392,
"bearing": 359.0,
"odometer": 40367.0,
"speed": 15.6464
},
"currentStopSequence": 36,
"currentStatus": "INCOMING_AT",
"timestamp": "1656447024",
"stopId": "2794",
"vehicle": {
"id": "5251"
}
}
},
In my code, I am taking in the json as a string. But when I try normalize json string to put into data frame
import pandas as pd
import json
import requests
base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()
packages_str = json.dumps(packages_json, indent=1)
df = pd.json_normalize(packages_str)
I get this error, I am definitely making some rookie error, but how exactly am I using this wrong? Are there additional arguments that may need in that?
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-33-aa23f9157eac> in <module>()
8 packages_str = json.dumps(packages_json, indent=1)
9
---> 10 df = pd.json_normalize(packages_str)
/usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
421 data = list(data)
422 else:
--> 423 raise NotImplementedError
424
425 # check to see if a simple recursive function is possible to
NotImplementedError:
When I had the json format within my code without the header portion referenced as an object, the pd.json_normalize(package_str) does work, why would that be, and what additional things would I need to do?
CodePudding user response:
The issue is, that pandas.json_normalize expects either a dictionary or a list of dictionaries but json.dumps returns a string.
It should work if you skip the json.dumps and directly input the json to the normalizer, like this:
import pandas as pd
import json
import requests
base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()
df = pd.json_normalize(packages_json)
If you take a look at the corresponding source-code of pandas you can see for yourself:
if isinstance(data, list) and not data:
return DataFrame()
elif isinstance(data, dict):
# A bit of a hackjob
data = [data]
elif isinstance(data, abc.Iterable) and not isinstance(data, str):
# GH35923 Fix pd.json_normalize to not skip the first element of a
# generator input
data = list(data)
else:
raise NotImplementedError
You should find this code at the path that is shown in the stacktrace, with the error raised on line 423: /usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py
I would advise you to use a code-linter or an IDE that has one included (like PyCharm for example) as this is the type of error that doesn't happen if you have one.
CodePudding user response:
I m not sure where is the problem, but if you are desperate, you can always make text function that will data-mine that Json.
Yes, it will be quite tiring, but with -10 variables you need to mine, for each row, you will be done in -60 minutes no problem.
Something like this:
def MineJson(text, target): #target is for example "id"
#print(text)
findword = text.find(target)
g=findword len(target) 5 #should not include the first "
new_text= text[g:] #output should be starting with RTVP:T...
return new_text
def WhatsAfter(text): #should return new text and RTVP:T:16763243
#print(text)
toFind='"'
findEnd = text.find(toFind)
g=findEnd
value=text[:g]
new_text= text[g:]
return new_text,value
I wrote it without testing, so maybe there will be some mistakes.