I am doing a small project where I develop a small tool that analyses data from chemical experiment.
I imported data from CSV and made some adjustments with Pandas until I ended up with this dictionary:
'6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
'6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
'6_batch_genx_zif-8-fg-6': [3, 3, 3],
'6_batch_genx_zif-8-fg-24': [4, 4, 4],
'6_batch_pfos_zif-8-fg-6': [5, 5, 5],
'6_batch_pfos_zif-8-fg-24': [6, 6, 6],
'7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
'7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
'7_batch_pfos_zif-8-fg-6': [9, 9, 9],
'8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
'8_batch_pfoa_fg': [11, 11, 11],
'8_batch_genx_fg': [12, 12, 12],
'8_batch_pfos_zif-8-fg-6': [13, 13, 13],
'8_batch_pfos_fg': [14, 14, 14]}
Each key represents the following:
6: number of experiment
batch: type of experiment
pfoa: type of chemical
zif-8-fg-6: type of metal used
Not to go into deep details, some experiments have the same variables but the only difference is the batch number. for example:
7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]
That means that the experiment number 8 is a redo for experiment number 7.
I want to write a code that checks if the type of experiment, type of chemical, and type of metal used are the same. Then, it will concatenate them into 1 key with 6 values. as an example:
7_batch_pfoa_zif-8-fg-6': [7, 7, 7]
8_batch_pfoa_zif-8-fg-6': [10, 10, 10]
should turn into:
7_batch_pfoa_zif-8-fg-6': [7, 7, 7, 10, 10, 10]
the number of batches is no longer relevant after merging the keys. It can be anything.
I tried to split the name with split_name = point.split('_')
and compare split_name[1]
and split_name[2]
and split_name[3]
but I couldn't figure out how to loop the key in the dictionary and compare it to the other keys.
Thanks in advance!
CodePudding user response:
Dictionaries can be iterated over by key, value, or both (i.e., items()
). The code below takes advantage of that. The only tricky part is that I use an intermediate dictionary, data_mapper
to facilitate the transforms between raw_data
and merged_data
.
from pprint import pprint
raw_data = {
'6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
'6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
'6_batch_genx_zif-8-fg-6': [3, 3, 3],
'6_batch_genx_zif-8-fg-24': [4, 4, 4],
'6_batch_pfos_zif-8-fg-6': [5, 5, 5],
'6_batch_pfos_zif-8-fg-24': [6, 6, 6],
'7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
'7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
'7_batch_pfos_zif-8-fg-6': [9, 9, 9],
'8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
'8_batch_pfoa_fg': [11, 11, 11],
'8_batch_genx_fg': [12, 12, 12],
'8_batch_pfos_zif-8-fg-6': [13, 13, 13],
'8_batch_pfos_fg': [14, 14, 14]
}
data_mapper = {}
merged_data = {}
for raw_key, raw_values in raw_data.items():
# create a key excluding experiment number
mapper_key = tuple(raw_key.split("_")[1:])
# extend merged data entry if already exists
if mapper_key in data_mapper:
merged_key = data_mapper[mapper_key]
merged_data[merged_key].extend(raw_values)
# create merged data entry if does not exist
else:
data_mapper[mapper_key] = raw_key
merged_data[raw_key] = raw_values
pprint(merged_data)
Output:
{'6_batch_genx_zif-8-fg-24': [4, 4, 4],
'6_batch_genx_zif-8-fg-6': [3, 3, 3],
'6_batch_pfoa_zif-8-fg-24': [2, 2, 2, 8, 8, 8],
'6_batch_pfoa_zif-8-fg-6': [1, 1, 1, 7, 7, 7, 10, 10, 10],
'6_batch_pfos_zif-8-fg-24': [6, 6, 6],
'6_batch_pfos_zif-8-fg-6': [5, 5, 5, 9, 9, 9, 13, 13, 13],
'8_batch_genx_fg': [12, 12, 12],
'8_batch_pfoa_fg': [11, 11, 11],
'8_batch_pfos_fg': [14, 14, 14]}
CodePudding user response:
this method creates a new dict
to keep the unique items. It keeps the valid part of the keys using slicing [2:]
.
d = {'6_batch_pfoa_zif-8-fg-6': [1, 1, 1],
'6_batch_pfoa_zif-8-fg-24': [2, 2, 2],
'6_batch_genx_zif-8-fg-6': [3, 3, 3],
'6_batch_genx_zif-8-fg-24': [4, 4, 4],
'6_batch_pfos_zif-8-fg-6': [5, 5, 5],
'6_batch_pfos_zif-8-fg-24': [6, 6, 6],
'7_batch_pfoa_zif-8-fg-6': [7, 7, 7],
'7_batch_pfoa_zif-8-fg-24': [8, 8, 8],
'7_batch_pfos_zif-8-fg-6': [9, 9, 9],
'8_batch_pfoa_zif-8-fg-6': [10, 10, 10],
'8_batch_pfoa_fg': [11, 11, 11],
'8_batch_genx_fg': [12, 12, 12],
'8_batch_pfos_zif-8-fg-6': [13, 13, 13],
'8_batch_pfos_fg': [14, 14, 14]}
d_new = {}
for k, v in d.items():
if k[2:] in d_new:
d_new[k[2:]] = d_new[k[2:]] v
else:
d_new[k[2:]] = v
for k,v in d_new.items():
print(k,v)
result:
batch_pfoa_zif-8-fg-6 [1, 1, 1, 7, 7, 7, 10, 10, 10]
batch_pfoa_zif-8-fg-24 [2, 2, 2, 8, 8, 8]
batch_genx_zif-8-fg-6 [3, 3, 3]
batch_genx_zif-8-fg-24 [4, 4, 4]
batch_pfos_zif-8-fg-6 [5, 5, 5, 9, 9, 9, 13, 13, 13]
batch_pfos_zif-8-fg-24 [6, 6, 6]
batch_pfoa_fg [11, 11, 11]
batch_genx_fg [12, 12, 12]
batch_pfos_fg [14, 14, 14]