I have a dataset in a example.txt
file containing about 80k rows. Its format is like this
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
The JSONs do not have any commas between them. What I want to do is to extract the text
and class
columns to a Pandas
DataFrame to look like this:
Text | Class |
---|---|
Text1 | Class1 |
Text2 | Class2 |
How can I do this?
Many thanks in advance!
CodePudding user response:
Just use pd.read_json
with lines=True
(as this format of JSON is called, not surprisingly, JSON Lines):
df = pd.read_json('path/to/your/file.json', lines=True)[['text', 'class']]