Home > Enterprise >  OpenDocumentXML Excel exponentitally slows when inserting cell sharedstringtable values
OpenDocumentXML Excel exponentitally slows when inserting cell sharedstringtable values

Time:05-19

The standard approach in code samples is to iterate the sharedstringtable seraching for the value to be added. Add; if not found and use existing if the string exists. However, this searching slows dramatically after a couple of hundred values.

I used a LINQ query to search for the string and it appears faster. See commented out lines in the code. However, I could not determine the position of the found item. Does anyone know a method to identify the location of an item in a sharedstring table.

        public static int InsertSharedStringItem(string text, InfoReg.OpenXML.Spreadsheet mySpreadsheet)
    {
        // If the part does not contain a SharedStringTable, create one.
        if (mySpreadsheet.sharedstringtable == null)
        {
            mySpreadsheet.shareStringPart.SharedStringTable = new SharedStringTable();
        }

        //SharedStringItem item = mySpreadsheet.sharedstringtable.Elements<SharedStringItem>()
        //            .Where(t => t.InnerText == text)
        //            .FirstOrDefault();
        //if(item != null)
        //{
        //    return item's location in the shared string table;
        //}

        // Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
        int i = 0;
        foreach (SharedStringItem item1 in mySpreadsheet.sharedstringtable.Elements<SharedStringItem>())
        {
            if (item1.InnerText == text)
            {
                return i;
            }
            i  ;
        }
        // The text does not exist in the part. Create the SharedStringItem and return its index.
        mySpreadsheet.sharedstringtable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
        return i;
    }

A sharedstringtable value is added using:

                index = InfoReg.OpenXML.Spreadsheet.InsertSharedStringItem(string.IsNullOrEmpty(datarow.AssignedTo) ? string.Empty : datarow.AssignedTo, myspreadsheet);
            // Insert cell into the new worksheet.
            cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr  ), rowAddr, myspreadsheet.worksheetPart.Worksheet);
            // Set the value of cell.
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);

To limit the problem, I use a non-shared string approach for cells where matches have a low probability. I use this code:

                Cell cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr  ), rowAddr, myspreadsheet.worksheetPart.Worksheet);
            // Set the value of cell.
            cell.CellValue = new CellValue(datarow.ForeName);
            cell.DataType = new EnumValue<CellValues>(CellValues.String);

Some articles suggested using a Dictionary<string, int> with a copy of the shared string table. I did not see any significant improvement in performance with the Dictionary.

CodePudding user response:

The idea with the Dictionary should give you better performance, but if you're recreating the dictionary every time you want to insert an element, you won't save any time, in fact it'll only take longer.

The way I see it is to wrap the SharedStringTable in a wrapper which caches and keeps track of the elements and make sure that the wrapper is the ONLY way you access the shared string table.

public class SharedStringWrapper
{
    private readonly SharedStringTable _table;
    private readonly Lazy<Dictionary<string, (int index, OpenXmlElement element)>> _lazyDict;
    private int _maxElement;

    public SharedStringWrapper(SharedStringTable table)
    {
        _table = table;

        _maxElement = _table.Elements().Count();
        //lazy initialize, this could take a while, no reason to do it until we need it.
        //in this case we ignore the casing of the text
        _lazyDict = new Lazy<Dictionary<string, (int index, OpenXmlElement element)>>(
            _table.Select((element, index) => (element, index)).ToDictionary(k => k.element.InnerText, v => (v.index, itm: v.element)));
    }

    /// <summary>
    /// Inserts text into the shared string table and returns the index of the inserted text.
    /// If the text already exists, it returns the index of the existing element
    /// </summary>
    /// <param name="text">text to insert</param>
    /// <returns>index of the element in the shared strings table</returns>
    public int InsertTextElement(string text)
    {
        //this is where you get the huge time saving - first time it will take a while, subsequently it'll be way faster
        if (_lazyDict.Value.TryGetValue(text, out var value))
            return value.index;

        //append child and increment the count.
        _table.AppendChild(new SharedStringItem(new Text(text)));
        _maxElement  ;

        return _maxElement;
    }
}

CodePudding user response:

The modified code:

  public class SharedStringWrapper
{
    private readonly SharedStringTable _table;
    private readonly Dictionary<string, int> _dict;
    private int _maxElement;
    private int _emptyStringElement;

    public SharedStringWrapper(SharedStringTable table)
    {
        _table = table;

        _maxElement = _table.Elements().Count();
        //lazy initialize, this could take a while, no reason to do it until we need it.
        //in this case we ignore the casing of the text
        _dict = new Dictionary<string, int>();
        _table.AddChild(new SharedStringItem(new Text(string.Empty)));
        int i = 0;
        foreach (SharedStringItem item in _table.Elements<SharedStringItem>())
        {
            _dict.Add(item.InnerText, i  );
        }
        _emptyStringElement = InsertTextElement(string.Empty) - 1;
    }

    /// <summary>
    /// Inserts text into the shared string table and returns the index of the inserted text.
    /// If the text already exists, it returns the index of the existing element
    /// </summary>
    /// <param name="text">text to insert</param>
    /// <returns>index of the element in the shared strings table</returns>
    public int InsertTextElement(string text)
    {
        if (text == null) return _emptyStringElement;
        int indx = 1;
        try
        {
            indx = _dict[text]   1;
        }
        catch
        {
            //append child and increment the count.
            _table.AppendChild(new SharedStringItem(new Text(text)));
            _dict.Add(text, _maxElement  );
            return _maxElement;
        }
        return indx;
    }
}
  • Related