Home > front end >  Panda Dataframe read_json for list values
Panda Dataframe read_json for list values

Time:05-04

I have a file with record json strings like:

{"foo": [-0.0482006893, 0.0416476727, -0.0495583452]}
{"foo": [0.0621534586, 0.0509529933, 0.122285351]}
{"foo": [0.0169468746, 0.00475309044, 0.0085169]}

When I call read_json on this file I get a dataframe where the column foo is an object. Calling .to_numpy() on this dataframe gives me an numpy array in the form of:

array([list([-0.050888903400000005, -0.00733460533, -0.0595958121]),
       list([0.10726073400000001, -0.0247702841, -0.0298063811]), ...,
       list([-0.10156482500000001, -0.0402663834, -0.0609775148])],
      dtype=object)

I want to parse the values of foo as numpy array instead of list. Anyone have any ideas?

CodePudding user response:

The easiest way is to create your DataFrame using .from_dict().

See a minimal example with one of your dicts.

d = {"foo": [-0.0482006893, 0.0416476727, -0.0495583452]}
df = pd.DataFrame().from_dict(d)
>>> df
        foo
0 -0.048201
1  0.041648
2 -0.049558
>>> df.dtypes
foo    float64
dtype: object

CodePudding user response:

How about doing:

df['foo'] = df['foo'].apply(np.array)
df

                                                 foo
0       [-0.0482006893, 0.0416476727, -0.0495583452]
1  [0.0621534586, 0.0509529933, 0.12228535100000001]
2  [0.0169468746, 0.00475309044, 0.00851689999999...

This shows that these have been converted to numpy.ndarray instances:

df['foo'].apply(type)

0    <class 'numpy.ndarray'>
1    <class 'numpy.ndarray'>
2    <class 'numpy.ndarray'>
Name: foo, dtype: object
  • Related