Home > Software engineering >  Convert csv file which is pipe delimited to JSON format
Convert csv file which is pipe delimited to JSON format

Time:10-14

I am trying to convert CSV file which is pipe delimited to JSON file and SetrecordID will be List and Pairs will be list of dictionaries.

CSV Input File

SetrecordID|SetMatchScore|Pairs
100,101,102|90|"100-101,40","100-102,80","101-102,90"
103,104,105|80|"103-104,60","103-105,90","104-105,90"
106,107,108|65|"106-107,55","106-108,60","107-108,80"
109,110,111|95|"109-110,85","109-111,100","110-111,100"

Expected output in JSON

[
    {
        Date: system date
        SetrecordID:[100,101,102]
        SetMatchScore:90
        Pairs:[
            {Pair1:100-101,matchscore:40},
            {Pair2:100-102,matchscore:80},
            {Pair3:101-102,matchscore:90}
              ]
    },
    {
        Date: system date
        SetrecordID:[103,104,105]
        SetMatchScore:90
        Pairs:[
            {Pair1:103-104,matchscore:60},
            {Pair2:103-105,matchscore:90},
            {Pair3:104-105,matchscore:90}
              ]
    }
]

Getting Output in JSON

   {
   "SetrecordID":[
      "109",
      "110",
      "111"
   ],
   "SetMatchScore":95,
   "Pairs":"109-110,85,\"109-111,100\",\"110-111,100\""
}

Code Tried

df = pd.read_csv('filename.csv',sep="|")
dict_val = {}
for index, row in df.iterrows():
row["SetrecordID"] = row["SetrecordID"].split(",")
dict_val.update(row)

print(dict_val)

CodePudding user response:

Just Pandas won't help you here. (Happily, you also don't need Pandas here.)

import csv, io, ast, json

# Using a `StringIO` here instead of reading from a file
# (but since `StringIO`s are file-like, you can substitute
#  an `open()` call here.)

data = io.StringIO(
    """
SetrecordID|SetMatchScore|Pairs
100,101,102|90|"100-101,40","100-102,80","101-102,90"
103,104,105|80|"103-104,60","103-105,90","104-105,90"
106,107,108|65|"106-107,55","106-108,60","107-108,80"
109,110,111|95|"109-110,85","109-111,100","110-111,100"
""".strip()
)

rows = []

for row in csv.DictReader(data, delimiter="|", quoting=csv.QUOTE_NONE):
    pairs = [pair.split(",", 1) for pair in ast.literal_eval(row["Pairs"])]
    row["Pairs"] = [
        {f"Pair{x}": key, "matchscore": int(val)}
        for x, (key, val) in enumerate(pairs, 1)
    ]
    row["SetrecordID"] = row["SetrecordID"].split(",")
    rows.append(row)

with open("data.json", "w") as outf:
    json.dump(rows, outf, indent=2)

will ingest that data to a dicts you can work with (or just output to a JSON file):

{'SetrecordID': ['100', '101', '102'], 'SetMatchScore': '90', 'Pairs': [{'Pair1': '100-101', 'matchscore': 40}, {'Pair2': '100-102', 'matchscore': 80}, {'Pair3': '101-102', 'matchscore': 90}]}
{'SetrecordID': ['103', '104', '105'], 'SetMatchScore': '80', 'Pairs': [{'Pair1': '103-104', 'matchscore': 60}, {'Pair2': '103-105', 'matchscore': 90}, {'Pair3': '104-105', 'matchscore': 90}]}
{'SetrecordID': ['106', '107', '108'], 'SetMatchScore': '65', 'Pairs': [{'Pair1': '106-107', 'matchscore': 55}, {'Pair2': '106-108', 'matchscore': 60}, {'Pair3': '107-108', 'matchscore': 80}]}
{'SetrecordID': ['109', '110', '111'], 'SetMatchScore': '95', 'Pairs': [{'Pair1': '109-110', 'matchscore': 85}, {'Pair2': '109-111', 'matchscore': 100}, {'Pair3': '110-111', 'matchscore': 100}]}
  • Related