Home > Mobile >  Cant process json dataset due to missing ,?
Cant process json dataset due to missing ,?

Time:05-17

I have a json dataset that I'd like to use for an ml project, after each record its missing the commma(,) so i'm not able to process it using pandas. What could I do to correct the format of the file? The link to the dataset is [https://www.kaggle.com/datasets/rmisra/news-category-dataset][1]

CodePudding user response:

Each line of the file is its own, json. You can put them in a list to form a df:

import json
import pandas as pd

with open('News_Category_Dataset_v2.json', 'r') as f:
    df = pd.DataFrame([json.loads(l) for l in f.readlines()])

print(df)

Output:

             category  ...        date
0               CRIME  ...  2018-05-26
1       ENTERTAINMENT  ...  2018-05-26
2       ENTERTAINMENT  ...  2018-05-26
3       ENTERTAINMENT  ...  2018-05-26
4       ENTERTAINMENT  ...  2018-05-26
...               ...  ...         ...
200848           TECH  ...  2012-01-28
200849         SPORTS  ...  2012-01-28
200850         SPORTS  ...  2012-01-28
200851         SPORTS  ...  2012-01-28
200852         SPORTS  ...  2012-01-28

[200853 rows x 6 columns]
  • Related