I have the 1st column that is around 4920 different chemical compounds.
For example:
0 Ag(AuS)2
1 Ag(W3Br7)2
2 Ag0.5Ge1Pb1.75S4
3 Ag0.5Ge1Pb1.75Se4
4 Ag2BBr
... ...
4916 ZrTaN3
4917 ZrTe
4918 ZrTi2O
4919 ZrTiF6
4920 ZrW2
I have the 2nd column that has all the elements of the periodic table numerically listed atomic number
0 H
1 He
2 Li
3 Be
4 B
.. ...
113 Fl
114 Uup
115 Lv
116 Uus
117 Uuo
How can I classify the first column into groups based on the compound's first element corresponding to their atomic number from column 2 so that I can return the first column
The atomic number of Ag = 27 The atomic number of Zr = 40
0 47
1 47
2 47
3 47
4 47
... ...
4916 40
4917 40
4918 40
4919 40
4920 40
CodePudding user response:
Since the first element could be a varying number of letters, the simplest solution would be to use the regex approach for getting the first section. For example:
import re
compounds = ["Ag(AuS)2", "HTiF", "ZrTaN3"]
for compound in compounds:
match = re.match(r"[A-Z][a-z]*", compound)
if match:
fist_element = match.group(0)
print(fist_element)
this will print out the first element of each compound. Note: If there are some more complex compounds and you need to adjust your regex, I recommend using https://regex101.com/ as a playground.
Once you have that information it just needs to be connected with the element in the second column which would be easiest if you mapped that column to a dictionary resembling:
{ H: 0, He: 1, Li: 2 ...}
which would allow you to simply get the element index by calling dict_with_elements.get(first_element)
.
From there on, the rest is just looping and writing data. I hope this helps.