I have a excel formula and i am extracting list of cell address from that formula and store in list. my code is working fine but when there is $ sign with cell address then regex fail to extract that cell address.
here is my sample code
List<string> lstConstituent = null;
string formula "=S$10 X11";
lstConstituent = e.Cell.Formula.GetFormulaConstituents().Distinct().ToList();
Here is body of GetFormulaConstituents() function
public static List<string> GetFormulaConstituents(this string formula)
{
List<string> strCellAddress = new List<string>();
string rxCellPattern = @"(?<![$]) # match if prefix is absent: $ symbol (prevents matching $A1 type of cells)
# (if all you have is $A$1 type of references, and not $A1 types, this negative look-behind isn't needed)
\b # word boundary (prevents matching Excel functions with a similar pattern to a cell)
(?<col>[A-Z] ) # named capture group, match uppercase letter at least once
# (change to [A-Za-z] if you have lowercase cells)
(?<row>\d ) # named capture group, match a number at least once
\b # word boundary
";
Regex rxCell = new Regex(rxCellPattern, RegexOptions.IgnorePatternWhitespace);
if (rxCell.IsMatch(formula))
{
//Console.WriteLine("Formula: {0}", formula);
foreach (Match cell in rxCell.Matches(formula))
{
strCellAddress.Add(cell.Value);
}
}
return strCellAddress;
}
Though I removed this (?<![$]) from regex but still could not extract the cell address S$10 from formula.
what to alter in my routine? as a result i should be able to extract cell address with $ sign.
CodePudding user response:
You can match an optional dollar sign in in the first group for example, without using a lookbehind assertion:
\b(?<col>[A-Z] \$?)(?<row>\d )\b