Home > OS >  Create a DataFrame from a JSON File
Create a DataFrame from a JSON File

Time:10-22

I am trying to load a JSON file into a DataFrame, I know this question was answered multiple times but I've beein trying all the possible solutions but with no success, here is how my JSON file looks like:

{
"event": {
    "origin": "devicename",
    "module": "",
    "interface": "",
    "component": "",
    "payload": "{\"typeofsensor\" : \"US_distance\",\"distance\": 2}"
}}
{
"event": {
    "origin": "devicename",
    "module": "",
    "interface": "",
    "component": "",
    "payload": "{\"typeofsensor\" : \"mpu6050\",\"accelX\": 0.06, \"accelY\": 0.50, \"accelZ\": -0.88, \"temp\": 25.45}"
}}

What I am trying to do is to extract the information in the "payload" in order to create a DataFrame with columns are typeofsensor and the sensor values since I have diffrent types of sensors.


I tried:

data = []
for line in open('data.JSON', 'r'):
    data.append(json.loads(line))

I got this error:

JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

I also tried:

df = pd.read_json('data.JSON', lines=True)

I got this error:

ValueError                                Traceback (most recent call last)
C:\Users\DEVELO~1\AppData\Local\Temp/ipykernel_204/911564313.py in <module>
----> 1 df = pd.read_json('data.JSON', lines=True)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
    612 
    613     with json_reader:
--> 614         return json_reader.read()
    615 
    616 

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in read(self)
    744                 data = ensure_str(self.data)
    745                 data_lines = data.split("\n")
--> 746                 obj = self._get_object_parser(self._combine_lines(data_lines))
    747         else:
    748             obj = self._get_object_parser(self.data)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in _get_object_parser(self, json)
    768         obj = None
    769         if typ == "frame":
--> 770             obj = FrameParser(json, **kwargs).parse()
    771 
    772         if typ == "series" or obj is None:

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in parse(self)
    883 
    884         else:
--> 885             self._parse_no_numpy()
    886 
    887         if self.obj is None:

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in _parse_no_numpy(self)
   1138         if orient == "columns":
   1139             self.obj = DataFrame(
-> 1140                 loads(json, precise_float=self.precise_float), dtype=None
   1141             )
   1142         elif orient == "split":

ValueError: Expected object or value

CodePudding user response:

Change your file into a valid format like below (eliminate {} from the middle and change key names to become unique):

{
    "event1": {
        "origin": "devicename",
        "module": "",
        "interface": "",
        "component": "",
        "payload": "{\"typeofsensor\" : \"US_distance\",\"distance\": 2}"
    },

    "event2": {
        "origin": "devicename",
        "module": "",
        "interface": "",
        "component": "",
        "payload": "{\"typeofsensor\" : \"mpu6050\",\"accelX\": 0.06, \"accelY\": 0.50, \"accelZ\": -0.88, \"temp\": 25.45}"
    }
}

easily read the file:

pd.read_json('path to/your_file.json')
#output
                                                   event1                                             event2
origin                                         devicename                                         devicename
module                                                                                                      
interface                                                                                                   
component                                                                                                   
payload    {"typeofsensor" : "US_distance","distance": 2}  {"typeofsensor" : "mpu6050","accelX": 0.06, "a...
  • Related