How to concatenate 2 keys in the same dictionary if the have similar name?-CodePudding

I am doing a small project where I develop a small tool that analyses data from chemical experiment.

I imported data from CSV and made some adjustments with Pandas until I ended up with this dictionary:

 '6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
 '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
 '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
 '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
 '8_batch_pfos_fg': [14, 14, 14]}

Each key represents the following:
6: number of experiment
batch: type of experiment
pfoa: type of chemical
zif-8-fg-6: type of metal used

Not to go into deep details, some experiments have the same variables but the only difference is the batch number. for example:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]

That means that the experiment number 8 is a redo for experiment number 7.

I want to write a code that checks if the type of experiment, type of chemical, and type of metal used are the same. Then, it will concatenate them into 1 key with 6 values. as an example:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]

should turn into:

7_batch_pfoa_zif-8-fg-6': [7, 7, 7, 10, 10, 10]

the number of batches is no longer relevant after merging the keys. It can be anything.

I tried to split the name with split_name = point.split('_') and compare split_name[1] and split_name[2] and split_name[3] but I couldn't figure out how to loop the key in the dictionary and compare it to the other keys.

Thanks in advance!

CodePudding user response：

Dictionaries can be iterated over by key, value, or both (i.e., items()). The code below takes advantage of that. The only tricky part is that I use an intermediate dictionary, data_mapper to facilitate the transforms between raw_data and merged_data.

from pprint import pprint

raw_data = {
    '6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
    '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
    '6_batch_genx_zif-8-fg-6': [3, 3, 3],
    '6_batch_genx_zif-8-fg-24': [4, 4, 4],
    '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
    '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
    '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
    '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
    '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
    '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
    '8_batch_pfoa_fg': [11, 11, 11],
    '8_batch_genx_fg': [12, 12, 12],
    '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
    '8_batch_pfos_fg': [14, 14, 14]
}

data_mapper = {}
merged_data = {}

for raw_key, raw_values in raw_data.items():

    # create a key excluding experiment number
    mapper_key = tuple(raw_key.split("_")[1:])
    
    # extend merged data entry if already exists
    if mapper_key in data_mapper:
        merged_key = data_mapper[mapper_key]
        merged_data[merged_key].extend(raw_values)

    # create merged data entry if does not exist
    else:
        data_mapper[mapper_key] = raw_key
        merged_data[raw_key] = raw_values

pprint(merged_data)

Output:

{'6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2, 8, 8, 8],
 '6_batch_pfoa_zif-8-fg-6': [1, 1, 1, 7, 7, 7, 10, 10, 10],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5, 9, 9, 9, 13, 13, 13],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_pfos_fg': [14, 14, 14]}

CodePudding user response：

this method creates a new dict to keep the unique items. It keeps the valid part of the keys using slicing [2:].

d =  {'6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
 '6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
 '6_batch_genx_zif-8-fg-6': [3, 3, 3],
 '6_batch_genx_zif-8-fg-24': [4, 4, 4],
 '6_batch_pfos_zif-8-fg-6': [5, 5, 5],
 '6_batch_pfos_zif-8-fg-24': [6, 6, 6],
 '7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
 '7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
 '7_batch_pfos_zif-8-fg-6': [9, 9, 9],
 '8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
 '8_batch_pfoa_fg': [11, 11, 11],
 '8_batch_genx_fg': [12, 12, 12],
 '8_batch_pfos_zif-8-fg-6': [13, 13, 13],
 '8_batch_pfos_fg': [14, 14, 14]}


d_new = {}
for k, v in d.items():
    if k[2:] in d_new:
        d_new[k[2:]] = d_new[k[2:]]   v
    else:
        d_new[k[2:]] = v

for k,v in d_new.items():
    print(k,v)

result:

batch_pfoa_zif-8-fg-6 [1, 1, 1, 7, 7, 7, 10, 10, 10]
batch_pfoa_zif-8-fg-24 [2, 2, 2, 8, 8, 8]
batch_genx_zif-8-fg-6 [3, 3, 3]
batch_genx_zif-8-fg-24 [4, 4, 4]
batch_pfos_zif-8-fg-6 [5, 5, 5, 9, 9, 9, 13, 13, 13]
batch_pfos_zif-8-fg-24 [6, 6, 6]
batch_pfoa_fg [11, 11, 11]
batch_genx_fg [12, 12, 12]
batch_pfos_fg [14, 14, 14]