Apologies if this has already been asked and answered but I couldn't find a satisfactory answer.
I have a list of thousands of chemical formulas that could include symbols for any element and I would like to determine the total number of atoms of any element in each formula. Examples include:
- CH3NO3
- CSe2
- C2Cl2
- C2Cl2O2
- C2Cl3F
- C2H2BrF3
- C2H2Br2
- C2H3Cl3Si
I just want the total number of atoms in a single formula, so for the first example (CH3NO3), the answer would be 8 (1 carbon 3 hydrogens 1 nitrogen 3 oxygens).
I have found some useful code by PEH (
Recognize also chemical formulas with prenthesis like Ca(OH)₂
If you need a more precise way (checking the existance of the Elements) and recognizing parenthesis you need to do it with RegEx again.
Because VBA doesn't support regular expressions out of the box we need to reference a Windows library first.
Add this function to a module
Public Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long Dim RetVal As Long Dim regEx As New RegExp With regEx .Global = True .MultiLine = True .IgnoreCase = False End With 'first pattern matches every element once regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)" Dim Matches As MatchCollection Set Matches = regEx.Execute(ChemFormula) Dim m As Match For Each m In Matches RetVal = RetVal IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1) Next m 'second patternd finds parenthesis and multiplies elements within regEx.Pattern = "(\((. ?)\)([0-9] ) ) ?" Set Matches = regEx.Execute(ChemFormula) For Each m In Matches RetVal = RetVal ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern Next m ChemRegexCountTotalElements = RetVal End Function
While this code will also recognize parenthesis, note that it does not recognize nested parenthesis.
CodePudding user response:
Here's my two cent's
Formula in C1
:
=ChemRegex(A1)
Where ChemRegex()
calls:
Public Function ChemRegex(ChemFormula As String) As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "[A-Z][a-z]*(\d*)"
If .Test(ChemFormula) Then
Set matches = .Execute(ChemFormula)
For Each Match In matches
ChemRegex = ChemRegex IIf(Match.Submatches(0) = "", 1, Match.Submatches(0))
Next
Else
ChemRegex = 0
End If
End With
End Function
Or in a (shorter) 2-step regex-solution:
Public Function ChemRegex(ChemFormula As String) As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "([A-Za-z])(?=[A-Z]|$)"
ChemFormula = .Replace(ChemFormula, "$1-1")
.Pattern = "\D "
ChemFormula = .Replace(ChemFormula, " ")
ChemRegex = Evaluate(ChemFormula)
End With
End Function