Home > Software design >  regex: How to construct a list of key words to match?
regex: How to construct a list of key words to match?

Time:04-11

I'm trying to match chemical formulas using regular expressions.

The chemical formulas can be 'TiO 2-x N x', in which Ti, O and N are all one of the 118 chemical elements. Therefore, something like [A-Za-z] or (Ti|O|N) won't work.

How can I construct a regular expression that can strictly match a formula like 'TiO 2-x N x', and a string like 'TiandO 2-x N x' won't be matched?

CodePudding user response:

Something like (((Ti|O|N) ) (\d -)?x ?) should be able to exactly match TiO 2-x N x without matching TiandO 2-x N x.

I am unsure of if x in your example is literally a letter x or if it a symbol that can be any word, in which case it would be (((Ti|O|N) ) (\d -)?\w ?) .

The (Ti|O|N) part can be extended to include all 118 elements.

The final regex would then be

(((H|He|Li|Be|B|C|N|O|F|Ne|Na|Mg|Al|Si|P|S|Cl|Ar|K|Ca|Sc|Ti|V|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Y|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I|Xe|Cs|Ba|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Hf|Ta|W|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Ac|Th|Pa|U|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|Rf|Db|Sg|Bh|Hs|Mt|Ds|Rg|Cn|Nh|Fl|Mc|Lv|Ts|Og) ) (\d -)?\w ?) 
  • Related