I am trying to convert CSV file which is pipe delimited to JSON file and SetrecordID will be List and Pairs will be list of dictionaries.
CSV Input File
SetrecordID|SetMatchScore|Pairs
100,101,102|90|"100-101,40","100-102,80","101-102,90"
103,104,105|80|"103-104,60","103-105,90","104-105,90"
106,107,108|65|"106-107,55","106-108,60","107-108,80"
109,110,111|95|"109-110,85","109-111,100","110-111,100"
Expected output in JSON
[
{
Date: system date
SetrecordID:[100,101,102]
SetMatchScore:90
Pairs:[
{Pair1:100-101,matchscore:40},
{Pair2:100-102,matchscore:80},
{Pair3:101-102,matchscore:90}
]
},
{
Date: system date
SetrecordID:[103,104,105]
SetMatchScore:90
Pairs:[
{Pair1:103-104,matchscore:60},
{Pair2:103-105,matchscore:90},
{Pair3:104-105,matchscore:90}
]
}
]
Getting Output in JSON
{
"SetrecordID":[
"109",
"110",
"111"
],
"SetMatchScore":95,
"Pairs":"109-110,85,\"109-111,100\",\"110-111,100\""
}
Code Tried
df = pd.read_csv('filename.csv',sep="|")
dict_val = {}
for index, row in df.iterrows():
row["SetrecordID"] = row["SetrecordID"].split(",")
dict_val.update(row)
print(dict_val)
CodePudding user response:
Just Pandas won't help you here. (Happily, you also don't need Pandas here.)
import csv, io, ast, json
# Using a `StringIO` here instead of reading from a file
# (but since `StringIO`s are file-like, you can substitute
# an `open()` call here.)
data = io.StringIO(
"""
SetrecordID|SetMatchScore|Pairs
100,101,102|90|"100-101,40","100-102,80","101-102,90"
103,104,105|80|"103-104,60","103-105,90","104-105,90"
106,107,108|65|"106-107,55","106-108,60","107-108,80"
109,110,111|95|"109-110,85","109-111,100","110-111,100"
""".strip()
)
rows = []
for row in csv.DictReader(data, delimiter="|", quoting=csv.QUOTE_NONE):
pairs = [pair.split(",", 1) for pair in ast.literal_eval(row["Pairs"])]
row["Pairs"] = [
{f"Pair{x}": key, "matchscore": int(val)}
for x, (key, val) in enumerate(pairs, 1)
]
row["SetrecordID"] = row["SetrecordID"].split(",")
rows.append(row)
with open("data.json", "w") as outf:
json.dump(rows, outf, indent=2)
will ingest that data to a dicts you can work with (or just output to a JSON file):
{'SetrecordID': ['100', '101', '102'], 'SetMatchScore': '90', 'Pairs': [{'Pair1': '100-101', 'matchscore': 40}, {'Pair2': '100-102', 'matchscore': 80}, {'Pair3': '101-102', 'matchscore': 90}]}
{'SetrecordID': ['103', '104', '105'], 'SetMatchScore': '80', 'Pairs': [{'Pair1': '103-104', 'matchscore': 60}, {'Pair2': '103-105', 'matchscore': 90}, {'Pair3': '104-105', 'matchscore': 90}]}
{'SetrecordID': ['106', '107', '108'], 'SetMatchScore': '65', 'Pairs': [{'Pair1': '106-107', 'matchscore': 55}, {'Pair2': '106-108', 'matchscore': 60}, {'Pair3': '107-108', 'matchscore': 80}]}
{'SetrecordID': ['109', '110', '111'], 'SetMatchScore': '95', 'Pairs': [{'Pair1': '109-110', 'matchscore': 85}, {'Pair2': '109-111', 'matchscore': 100}, {'Pair3': '110-111', 'matchscore': 100}]}