convert elements in a list to dictionary-CodePudding

I would like to convert the data into a dictionary to work with. The data looks like keys and values in a dictionary, but they are combined into a single element.

here's a sample of the data

['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3 ]",\n',
 '"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
 '"bromine": "[Br]",\n']

I have tried removing the : in the data using the replace command, but it didn't work.

i=0
for line in lines:
    a = lines[i]
    a.replace(":", "")
    lines[i] = a
    i =1

CodePudding user response：

d = {}
for line in lines:
    s = line.split(":")
    d[s[0].strip(' "')] = s[1].strip(' ",\n')

CodePudding user response：

You can use eval:

ll = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3 ]",\n',
 '"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
 '"bromine": "[Br]",\n']

dd = eval('{'   ' '.join(ll).replace('\n', '')   '}')

This converts your list to a single string, removes the \n and adds the curly braces, you then have a str that can be evaluated as it's valid python code to form a dictionary.

CodePudding user response：

Each element in the list is a string ending in ',\n'. These should be removed. The keys and values have unnecessary double-quotes. These should also be removed. I think this should give you what you need:

mylist = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3 ]",\n',
 '"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
 '"bromine": "[Br]",\n']

mydict = dict()
for e in mylist:
    t = e.replace('"', '').split(':')
    mydict[t[0]] = t[1][:-2].strip()

print(mydict)

CodePudding user response：

This is just a problem of formatting or more precisely data cleaning. I am not sure why you are using an increment variable. The foremost thing I will like to handle is the newline character at the end of each element, then split it based on ': ' and create a dictionary using the values. You can try the code below.

d = {}
for element in lines:
    element = element.rstrip(",\n")
    key, value = element.split(": ")
    d[key.strip('"')] = value.strip('"')
d

I have used to strip('"') to remove multiple quotation marks.