Say I have a json file with lines of data like this :
file.json :
{'ID':'098656', 'query':'query_file.txt'}
{'A':1, 'B':2}
{'A':3, 'B':6}
{'A':0, 'B':4}
...
where the first line is just explanations about the given file and how it was created. I would like to open it with something like :
import pandas as pd
df = pd.read_json('file.json', lines=True)
However, how do I read the data starting on line 3 ? I know that pd.read_csv
has a skiprows
argument, but it does not look like pd.read_json
has one.
I would like something returning a DataFrame with the columns A
and B
only, and possibly better than dropping the first line and ID
and query
columns after loading the whole file.
CodePudding user response:
You can read the lines in the file and skip the first n ones, then pass it to pandas:
import pandas as pd
import json
with open('filename.json') as f:
lines = f.read().splitlines()[2:]
df_tmp = pd.DataFrame(lines)
df_tmp.columns = ['json_data']
df_tmp['json_data'].apply(json.loads)
df = pd.json_normalize(df_tmp['json_data'].apply(json.loads))
CodePudding user response:
We can pass into pandas.read_json
a file handler as well. If before that we read part of the data, then only the rest will be converted to DataFrame
.
def read_json(file, skiprows):
with open(file) as f:
f.readlines(skiprows)
df = pd.read_json(f, lines=True)
return df