I would like to convert the data into a dictionary to work with. The data looks like keys and values in a dictionary, but they are combined into a single element.
here's a sample of the data
['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
'"acetylenic carbon": "[$([CX2]#C)]",\n',
'"acyl bromide": "[CX3](=[OX1])[Br]",\n',
'"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
'"acyl fluoride": "[CX3](=[OX1])[F]",\n',
'"acyl iodide": "[CX3](=[OX1])[I]",\n',
'"aldehyde": "[CX3H1](=O)[#6]",\n',
'"alkane": "[CX4]",\n',
'"allenic carbon": "[$([CX2](=C)=C)]",\n',
'"amide": "[NX3][CX3](=[OX1])[#6]",\n',
'"amidium": "[NX3][CX3]=[NX3 ]",\n',
'"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
'"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
'"azo nitrogen": "[NX2]=N",\n',
'"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
'"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
'"diazene": "[NX2]=[NX2]",\n',
'"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
'"bromine": "[Br]",\n']
I have tried removing the : in the data using the replace command, but it didn't work.
i=0
for line in lines:
a = lines[i]
a.replace(":", "")
lines[i] = a
i =1
CodePudding user response:
d = {}
for line in lines:
s = line.split(":")
d[s[0].strip(' "')] = s[1].strip(' ",\n')
CodePudding user response:
You can use eval
:
ll = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
'"acetylenic carbon": "[$([CX2]#C)]",\n',
'"acyl bromide": "[CX3](=[OX1])[Br]",\n',
'"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
'"acyl fluoride": "[CX3](=[OX1])[F]",\n',
'"acyl iodide": "[CX3](=[OX1])[I]",\n',
'"aldehyde": "[CX3H1](=O)[#6]",\n',
'"alkane": "[CX4]",\n',
'"allenic carbon": "[$([CX2](=C)=C)]",\n',
'"amide": "[NX3][CX3](=[OX1])[#6]",\n',
'"amidium": "[NX3][CX3]=[NX3 ]",\n',
'"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
'"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
'"azo nitrogen": "[NX2]=N",\n',
'"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
'"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
'"diazene": "[NX2]=[NX2]",\n',
'"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
'"bromine": "[Br]",\n']
dd = eval('{' ' '.join(ll).replace('\n', '') '}')
This converts your list to a single string, removes the \n
and adds the curly braces, you then have a str that can be evaluated as it's valid python code to form a dictionary.
CodePudding user response:
Each element in the list is a string ending in ',\n'. These should be removed. The keys and values have unnecessary double-quotes. These should also be removed. I think this should give you what you need:
mylist = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
'"acetylenic carbon": "[$([CX2]#C)]",\n',
'"acyl bromide": "[CX3](=[OX1])[Br]",\n',
'"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
'"acyl fluoride": "[CX3](=[OX1])[F]",\n',
'"acyl iodide": "[CX3](=[OX1])[I]",\n',
'"aldehyde": "[CX3H1](=O)[#6]",\n',
'"alkane": "[CX4]",\n',
'"allenic carbon": "[$([CX2](=C)=C)]",\n',
'"amide": "[NX3][CX3](=[OX1])[#6]",\n',
'"amidium": "[NX3][CX3]=[NX3 ]",\n',
'"amino acid": "[$([NX3H2,NX4H3 ]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
'"azide": "[$(-[NX2-]-[NX2 ]#[NX1]),$(-[NX2]=[NX2 ]=[NX1-])]",\n',
'"azo nitrogen": "[NX2]=N",\n',
'"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
'"azoxy nitrogen": "[$([NX2]=[NX3 ]([O-])[#6]),$([NX2]=[NX3 0](=[O])[#6])]",\n',
'"diazene": "[NX2]=[NX2]",\n',
'"diazo nitrogen": "[$([#6]=[N ]=[N-]),$([#6-]-[N ]#[N])]",\n',
'"bromine": "[Br]",\n']
mydict = dict()
for e in mylist:
t = e.replace('"', '').split(':')
mydict[t[0]] = t[1][:-2].strip()
print(mydict)
CodePudding user response:
This is just a problem of formatting or more precisely data cleaning. I am not sure why you are using an increment variable. The foremost thing I will like to handle is the newline character at the end of each element, then split it based on ': ' and create a dictionary using the values. You can try the code below.
d = {}
for element in lines:
element = element.rstrip(",\n")
key, value = element.split(": ")
d[key.strip('"')] = value.strip('"')
d
I have used to strip('"') to remove multiple quotation marks.