Home > Software engineering >  Making dictionary in dictionary to separate data by the same values in one column and then from seco
Making dictionary in dictionary to separate data by the same values in one column and then from seco

Time:03-28

I am new in Python and I am stuck with one problem for a few days now. I made a script that:

-takes data from CSV file -sort it by same values in first column of data file -instert sorted data in specifield line in different template text file -save the file in as many copies as there are different values in first column from data file This picture below show how it works:

enter image description here

But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:

enter image description here

What I also need is to add somewhere separeted value of first column from data file by "_".

There is datafile:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK 
113_0,2137,TRE
113_0,2137,OMG

and there is code i made:

import shutil

with open("data.csv") as f:
    contents = f.read()
    contents = contents.splitlines()

values_per_baseline = dict()

for line in contents:
    key = line.split(',')[0]
    values = line.split(',')[1:]
    if key not in values_per_baseline:
        values_per_baseline[key] = []
    values_per_baseline[key].append(values)

for file in values_per_baseline.keys():
    x = 3
    shutil.copyfile("of.txt", (f"of_%s.txt" % file))
    filename = f"of_%s.txt" % file
    for values in values_per_baseline[file]:
        with open(filename, "r") as f:
            contents = f.readlines()
            contents.insert(x, '      o = '   values[0]   '\n          '   'a = '   values[1]  '\n')
        with open(filename, "w") as f:
            contents = "".join(contents)
            f.write(contents)
            f.close()

I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works. Any help or suggestion will be much appreciated.

CodePudding user response:

You could try the following:

import csv
from collections import defaultdict


values_per_baseline = defaultdict(lambda: defaultdict(list))
with open("data.csv", "r") as file:
    for key1, key2, value in csv.reader(file):
        values_per_baseline[key1][key2].append(value)

x = 3
for filekey, content in values_per_baseline.items():
    with open("of.txt", "r") as fin,\
         open(f"of_{filekey}.txt", "w") as fout:
        fout.writelines(next(fin) for _ in range(x))
        for key, values in content.items():
            fout.write(
                f'      o = {key}\n'
                  '          a = '
                  ' <COMMA> '.join(values)
                  '\n'
            )
        fout.writelines(fin)

The input-reading part is using the csv module from the standard library (for convenience) and a defaultdict. The file is read into a nested dictionary.

CodePudding user response:

Content of datafile.csv:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK 
113_0,2137,TRE
113_0,2137,OMG

Possible solution is the following:

def nested_list_to_dict(lst):
    result = {}
    subgroup = {}
    if all(len(l) == 3 for l in lst):
        for first, second, third in lst:
            result.setdefault(first, []).append((second, third))
        for k, v in result.items():
            for item1, item2 in v:
                subgroup.setdefault(item1, []).append(item2.strip())
            result[k] = subgroup
            subgroup = {}
    else:
        print("Input data must have 3 items like '111_0,3005,QWE'")
    return result


with open("datafile.csv", "r", encoding="utf-8") as f:
    content = f.read().splitlines()

data = nested_list_to_dict([line.split(',') for line in content])
print(data)

# ... rest of your code ....

Prints

{'111_0': {'3005': ['QWE'], '3006': ['SDE', 'LFR']}, 
 '111_1': {'3005': ['QWE'], '5345': ['JTR']}, 
 '112_0': {'3103': ['JPP'], '3343': ['PDK']}, 
 '113_0': {'2137': ['TRE', 'OMG']}}
  • Related