I'm trying to match chemical formulas using regular expressions.
The chemical formulas can be 'TiO 2-x N x', in which Ti, O and N are all one of the 118 chemical elements. Therefore, something like [A-Za-z]
or (Ti|O|N)
won't work.
How can I construct a regular expression that can strictly match a formula like 'TiO 2-x N x', and a string like 'TiandO 2-x N x' won't be matched?
CodePudding user response:
Something like (((Ti|O|N) ) (\d -)?x ?)
should be able to exactly match TiO 2-x N x
without matching TiandO 2-x N x
.
I am unsure of if x
in your example is literally a letter x or if it a symbol that can be any word, in which case it would be (((Ti|O|N) ) (\d -)?\w ?)
.
The (Ti|O|N)
part can be extended to include all 118 elements.
The final regex would then be
(((H|He|Li|Be|B|C|N|O|F|Ne|Na|Mg|Al|Si|P|S|Cl|Ar|K|Ca|Sc|Ti|V|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Y|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I|Xe|Cs|Ba|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Hf|Ta|W|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Ac|Th|Pa|U|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|Rf|Db|Sg|Bh|Hs|Mt|Ds|Rg|Cn|Nh|Fl|Mc|Lv|Ts|Og) ) (\d -)?\w ?)