Currently, I have a CSV file with the following example --
File | skill | experience | overall_experience |
---|---|---|---|
1 | Java | 1.5 | 3 |
1 | Python | 1.0 | 3 |
1 | SQL | 0.5 | 3 |
There are multiple entries for many such files but I need to merge the skills and their respective experience into a single value belonging to a single key, something like this -
{
"1": {
"file": "1",
"skill": ["Java", "Python", "SQL"],
"experience": [1.5, 1.0, 0.5]
"Overall_exp": 3.0
},}
I tried a Python Code for this but it is giving me only the value of last skill and last experience (and not the whole thing in a list)
Here is the code I was using --
import csv
import json
# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):
# create a dictionary
data = {}
# Open a csv reader called DictReader
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
# Convert each row into a dictionary
# and add it to data
for rows in csvReader:
# Assuming a column named 'file' to
# be the primary key
key = rows['file']
data[key] = rows
# Open a json writer, and use the json.dumps()
# function to dump data
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
# Decide the two file paths according to your
# computer system
csvFilePath = 'skill_matrix.csv'
jsonFilePath = 'skill_matrix.json'
# Call the make_json function
make_json(csvFilePath, jsonFilePath)
The output that I get here is this --
{
"1": {
"file": "1",
"skill": "SQL",
"experience": "0.5"
"Overall_exp": "3.0"
},}
How can I convert it to the former json format and not the latter?
CodePudding user response:
You can use pandas to read your csv, group by File and export to json:
df = pd.read_csv(your_csv)
df = df.groupby('File', as_index=False).agg({'skill': list, 'experience': list, 'overall_experience': np.mean})
print(df.to_json(orient='index', indent=4))
Note: you can specify the aggregation functions for your columns in a dictionary
Output:
{
"0":{
"File":1,
"skill":[
"Java",
"Python",
"SQL"
],
"experience":[
1.5,
1.0,
0.5
],
"overall_experience":3.0
}
}
CodePudding user response:
I think that loading into Pandas first and then going from all data to the narrowing strategy is cleaner and easier. You can use the following code for parsing your data into JSON files;
import pandas as pd
import json
# Load the CSV into Pandas
df = pd.read_csv('1.csv', header=0)
data = df.to_dict(orient='list')
# Delete / change as you wish
data['File'] = str(data['File'][0])
data['overall_experience'] = data['overall_experience'][0]
# Save as json
with open('1.json', 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
Result (1.json)
{
"File": "1",
"skill": [
"Java",
"Python",
"SQL"
],
"experience": [
1.5,
1.0,
0.5
],
"overall_experience": 3
}
I suppose hat you have multiple file id in a CSV file. Your given example is too minimalistic. Anyhow, then you can create a master dictionary and add your smaller ones as follows;
import pandas as pd
import json
# Load the CSV into Pandas
df = pd.read_csv('1.csv', header=0)
# Master dictionary
master_dict = {}
for idx, file_id in enumerate(df["File"].unique()):
data = df[df['File'] == file_id].to_dict(orient='list')
# Delete / change as you wish
data['File'] = str(data['File'][0])
data['overall_experience'] = data['overall_experience'][0]
master_dict[idx] = data
# Save as json
with open('1.json', 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(master_dict, indent=4))
Result (1.json)
{
"0": {
"File": "1",
"skill": [
"Java",
"Python",
"SQL"
],
"experience": [
1.5,
1.0,
0.5
],
"overall_experience": 3
},
"1": {
"File": "2",
"skill": [
"Java",
"Python"
],
"experience": [
2.0,
2.5
],
"overall_experience": 1
}
}