CSV to JSON of Lists-CodePudding

Currently, I have a CSV file with the following example --

File	skill	experience	overall_experience
1	Java	1.5	3
1	Python	1.0	3
1	SQL	0.5	3

There are multiple entries for many such files but I need to merge the skills and their respective experience into a single value belonging to a single key, something like this -

{
"1": {
    "file": "1",
    "skill": ["Java", "Python", "SQL"],
    "experience": [1.5, 1.0, 0.5]
    "Overall_exp": 3.0
},}

I tried a Python Code for this but it is giving me only the value of last skill and last experience (and not the whole thing in a list)

Here is the code I was using --

import csv
import json


# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):
    # create a dictionary
    data = {}
    # Open a csv reader called DictReader
    with open(csvFilePath, encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)
        # Convert each row into a dictionary
        # and add it to data
        for rows in csvReader:
            # Assuming a column named 'file' to
            # be the primary key
            key = rows['file']
            data[key] = rows

    # Open a json writer, and use the json.dumps()
    # function to dump data
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonf.write(json.dumps(data, indent=4))




# Decide the two file paths according to your
# computer system
csvFilePath = 'skill_matrix.csv'
jsonFilePath = 'skill_matrix.json'

# Call the make_json function
make_json(csvFilePath, jsonFilePath)

The output that I get here is this --

{
"1": {
    "file": "1",
    "skill": "SQL",
    "experience": "0.5"
    "Overall_exp": "3.0"
},}

How can I convert it to the former json format and not the latter?

CodePudding user response：

You can use pandas to read your csv, group by File and export to json:

df = pd.read_csv(your_csv)
df = df.groupby('File', as_index=False).agg({'skill': list, 'experience': list, 'overall_experience': np.mean})
print(df.to_json(orient='index', indent=4))

Note: you can specify the aggregation functions for your columns in a dictionary

Output:

{
    "0":{
        "File":1,
        "skill":[
            "Java",
            "Python",
            "SQL"
        ],
        "experience":[
            1.5,
            1.0,
            0.5
        ],
        "overall_experience":3.0
    }
}

CodePudding user response：

I think that loading into Pandas first and then going from all data to the narrowing strategy is cleaner and easier. You can use the following code for parsing your data into JSON files;

import pandas as pd
import json

# Load the CSV into Pandas
df = pd.read_csv('1.csv', header=0)
data = df.to_dict(orient='list')

# Delete / change as you wish 
data['File'] = str(data['File'][0])
data['overall_experience'] = data['overall_experience'][0]

# Save as json 
with open('1.json', 'w', encoding='utf-8') as jsonf:
    jsonf.write(json.dumps(data, indent=4))

Result (1.json)

{
    "File": "1",
    "skill": [
        "Java",
        "Python",
        "SQL"
    ],
    "experience": [
        1.5,
        1.0,
        0.5
    ],
    "overall_experience": 3
}

I suppose hat you have multiple file id in a CSV file. Your given example is too minimalistic. Anyhow, then you can create a master dictionary and add your smaller ones as follows;

import pandas as pd
import json

# Load the CSV into Pandas
df = pd.read_csv('1.csv', header=0)

# Master dictionary
master_dict = {}

for idx, file_id in enumerate(df["File"].unique()):
  data = df[df['File'] == file_id].to_dict(orient='list')

  # Delete / change as you wish 
  data['File'] = str(data['File'][0])
  data['overall_experience'] = data['overall_experience'][0]

  master_dict[idx] = data

# Save as json 
with open('1.json', 'w', encoding='utf-8') as jsonf:
    jsonf.write(json.dumps(master_dict, indent=4))

Result (1.json)

{
    "0": {
        "File": "1",
        "skill": [
            "Java",
            "Python",
            "SQL"
        ],
        "experience": [
            1.5,
            1.0,
            0.5
        ],
        "overall_experience": 3
    },
    "1": {
        "File": "2",
        "skill": [
            "Java",
            "Python"
        ],
        "experience": [
            2.0,
            2.5
        ],
        "overall_experience": 1
    }
}