Home > Blockchain >  Determine total number of atoms in a chemical formula
Determine total number of atoms in a chemical formula

Time:10-21

Apologies if this has already been asked and answered but I couldn't find a satisfactory answer.

I have a list of thousands of chemical formulas that could include symbols for any element and I would like to determine the total number of atoms of any element in each formula. Examples include:

  • CH3NO3
  • CSe2
  • C2Cl2
  • C2Cl2O2
  • C2Cl3F
  • C2H2BrF3
  • C2H2Br2
  • C2H3Cl3Si

I just want the total number of atoms in a single formula, so for the first example (CH3NO3), the answer would be 8 (1 carbon 3 hydrogens 1 nitrogen 3 oxygens).

I have found some useful code by PEH (enter image description here

Recognize also chemical formulas with prenthesis like Ca(OH)₂

If you need a more precise way (checking the existance of the Elements) and recognizing parenthesis you need to do it with RegEx again.

Because VBA doesn't support regular expressions out of the box we need to reference a Windows library first.

  1. Add reference to regex under Tools then References
    enter image description here

  2. and selecting Microsoft VBScript Regular Expression 5.5
    enter image description here

  3. Add this function to a module

    Public Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long
        Dim RetVal As Long
    
        Dim regEx As New RegExp
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
        End With
    
        'first pattern matches every element once
        regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
        Dim Matches As MatchCollection
        Set Matches = regEx.Execute(ChemFormula)
    
        Dim m As Match
        For Each m In Matches
            RetVal = RetVal   IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        Next m
    
        'second patternd finds parenthesis and multiplies elements within
        regEx.Pattern = "(\((. ?)\)([0-9] ) ) ?"
        Set Matches = regEx.Execute(ChemFormula)
        For Each m In Matches
            RetVal = RetVal   ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
        Next m
    
        ChemRegexCountTotalElements = RetVal
    End Function
    

While this code will also recognize parenthesis, note that it does not recognize nested parenthesis.

enter image description here

CodePudding user response:

Here's my two cent's

enter image description here

Formula in C1:

=ChemRegex(A1)

Where ChemRegex() calls:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "[A-Z][a-z]*(\d*)"
    If .Test(ChemFormula) Then
        Set matches = .Execute(ChemFormula)
        For Each Match In matches
            ChemRegex = ChemRegex   IIf(Match.Submatches(0) = "", 1, Match.Submatches(0))
        Next
    Else
        ChemRegex = 0
    End If
End With

End Function

Or in a (shorter) 2-step regex-solution:

Public Function ChemRegex(ChemFormula As String) As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "([A-Za-z])(?=[A-Z]|$)"
    ChemFormula = .Replace(ChemFormula, "$1-1")
    .Pattern = "\D "
    ChemFormula = .Replace(ChemFormula, " ")
    ChemRegex = Evaluate(ChemFormula)
End With

End Function
  • Related