Home > Software design >  How to concatenate 2 keys in the same dictionary if the have similar name?
How to concatenate 2 keys in the same dictionary if the have similar name?

Time:12-13

I am doing a small project where I develop a small tool that analyses data from chemical experiment.

I imported data from CSV and made some adjustments with Pandas until I ended up with this dictionary:

 '6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
 '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
 '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
 '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
 '8_batch_pfos_fg': [14, 14, 14]}

Each key represents the following:
6: number of experiment
batch: type of experiment
pfoa: type of chemical
zif-8-fg-6: type of metal used

Not to go into deep details, some experiments have the same variables but the only difference is the batch number. for example:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]

That means that the experiment number 8 is a redo for experiment number 7.

I want to write a code that checks if the type of experiment, type of chemical, and type of metal used are the same. Then, it will concatenate them into 1 key with 6 values. as an example:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]

should turn into:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7, 10, 10, 10]

the number of batches is no longer relevant after merging the keys. It can be anything.

I tried to split the name with split_name = point.split('_') and compare split_name[1] and split_name[2] and split_name[3] but I couldn't figure out how to loop the key in the dictionary and compare it to the other keys.

Thanks in advance!

CodePudding user response:

Dictionaries can be iterated over by key, value, or both (i.e., items()). The code below takes advantage of that. The only tricky part is that I use an intermediate dictionary, data_mapper to facilitate the transforms between raw_data and merged_data.

from pprint import pprint

raw_data = {
    '6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
    '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
    '6_batch_genx_zif-8-fg-6': [3, 3, 3],
    '6_batch_genx_zif-8-fg-24': [4, 4, 4],
    '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
    '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
    '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
    '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
    '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
    '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
    '8_batch_pfoa_fg': [11, 11, 11],
    '8_batch_genx_fg': [12, 12, 12],
    '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
    '8_batch_pfos_fg': [14, 14, 14]
}

data_mapper = {}
merged_data = {}

for raw_key, raw_values in raw_data.items():

    # create a key excluding experiment number
    mapper_key = tuple(raw_key.split("_")[1:])
    
    # extend merged data entry if already exists
    if mapper_key in data_mapper:
        merged_key = data_mapper[mapper_key]
        merged_data[merged_key].extend(raw_values)

    # create merged data entry if does not exist
    else:
        data_mapper[mapper_key] = raw_key
        merged_data[raw_key] = raw_values

pprint(merged_data)

Output:

{'6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2, 8, 8, 8],
 '6_batch_pfoa_zif-8-fg-6': [1, 1, 1, 7, 7, 7, 10, 10, 10],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5, 9, 9, 9, 13, 13, 13],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_pfos_fg': [14, 14, 14]}

CodePudding user response:

this method creates a new dict to keep the unique items. It keeps the valid part of the keys using slicing [2:].

d =  {'6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
 '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
 '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
 '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
 '8_batch_pfos_fg': [14, 14, 14]}


d_new = {}
for k, v in d.items():
    if k[2:] in d_new:
        d_new[k[2:]] = d_new[k[2:]]   v
    else:
        d_new[k[2:]] = v

for k,v in d_new.items():
    print(k,v)

result:

batch_pfoa_zif-8-fg-6 [1, 1, 1, 7, 7, 7, 10, 10, 10]
batch_pfoa_zif-8-fg-24 [2, 2, 2, 8, 8, 8]
batch_genx_zif-8-fg-6 [3, 3, 3]
batch_genx_zif-8-fg-24 [4, 4, 4]
batch_pfos_zif-8-fg-6 [5, 5, 5, 9, 9, 9, 13, 13, 13]
batch_pfos_zif-8-fg-24 [6, 6, 6]
batch_pfoa_fg [11, 11, 11]
batch_genx_fg [12, 12, 12]
batch_pfos_fg [14, 14, 14]
  • Related