Home > other >  Extract Dataframe from multiple JSONs in a file
Extract Dataframe from multiple JSONs in a file

Time:02-01

I have a dataset in a example.txt file containing about 80k rows. Its format is like this

{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}

The JSONs do not have any commas between them. What I want to do is to extract the text and class columns to a Pandas DataFrame to look like this:

Text Class
Text1 Class1
Text2 Class2

How can I do this?

Many thanks in advance!

CodePudding user response:

Just use pd.read_json with lines=True (as this format of JSON is called, not surprisingly, JSON Lines):

df = pd.read_json('path/to/your/file.json', lines=True)[['text', 'class']]
  •  Tags:  
  • Related