How to create a json file from csv file where csv file has following format-CodePudding

Name,abc
Title,teacher
Email,abc.edu
Phone,000-000-0000
Office,21building
About,"abc is teacher"
Name,def
Title,plumber
Email,[email protected]
Phone,111-111-1111
Office,22building
About,"The best plumber in the town"
Name,ghi
Title,producer
Phone,333-333-3333
Office,33building
About,"The best producer"

CodePudding user response：

I would use pandas library to read the .csv (foo.csv in this example) data and then convert it to json using to_json.

In this case you have a dictionary

import pandas as pd
pd.read_csv('aaa.csv', header=None, index_col=0, squeeze=True)\
    .to_json(orient='columns')

If you want to export a .json file

import pandas as pd
with open('exported_file.json', 'w') as f:
    pd.read_csv('foo.csv', header=None, index_col=0, squeeze=True)\
        .to_json(f, orient='columns')

CodePudding user response：

I suppose that the CSV-file containes a sequentional records about personal in a format "Label,Value" and you'd like to reorganize it in the separated records for each person with labels along the second dimension as a column names. The output is going to be stored as a JSON-file.

If this is the case, then we can use pandas.DataFrame.pivot to change the scructure of data. But before that we have to group labes by person. For this purpose, I will assume that the Name label is obligatory for each person, and each unique label appears at most once between names:

data = '''Name,abc
Title,teacher
Email,abc.edu
Phone,000-000-0000
Office,21building
About,"abc is teacher"
Name,def
Title,plumber
Email,[email protected]
Phone,111-111-1111
Office,22building
About,"The best plumber in the town"
Name,ghi
Title,producer
Phone,333-333-3333
Office,33building
About,"The best producer"'''

df = pd.read_csv(StringIO(data), names=['label','value'])
df['grouper'] = (df['label'] == 'Name').cumsum()
df = df.pivot(index='grouper', columns='label', values='value')

Having this data we can save it as:

df.to_json('test.json', orient='records', lines=True)