I've got an XLSX sheet that contains about 30 columns and 130,000 rows.
In the past I used OleDb data reader to parse such files but it was problematic in case of reading unknown excel files with mixed cell data types.
I found ClosedXML but the issue I have with it is that the memory usage is much higher than in the case of OleDb. I might be missing something but it appears that the entire workbook has to be loaded before even a single cell can be accessed, my memory usage goes up by ~500MB when I want to read a single cell:
using (XLWorkbook workBook = new XLWorkbook(_path))
{
IXLWorksheet workSheet = workBook.Worksheet(tableName);
Console.WriteLine(workSheet.Cell(1, 1).Value);
}
Is there any way to optimise this?
- Lazy loading cells?
- Reducing the amount of loaded cell properties just to the ones that are of interest?
- Anything?
If there's no options to optimize this would you be able to recommend any other frameworks/libraries?
Thanks.
CodePudding user response:
it appears that the entire workbook has to be loaded before even a single cell can be accessed,
That is correct. This is ClosedXML's behaviour by design. It implies higher memory usage in order to fascilitate more powerfull cell manipulation. I'd suggest looking at a different library, maybe EPPlus, if memory usage is an issue for you.
CodePudding user response:
The memory usage might be hard to avoid, depending on the kind of the data in the file. Internally, Excel files use a "shared string" table to store a single copy of each string, and refer to these by index from the worksheet data. I imagine most libraries will load the entire shared strings table before reading any worksheet data. If your file has a lot of unique strings, it can take a bit of time and memory to load the entire shared strings table.
ClosedXML is one of the least efficient libraries for reading Excel data. As Francois mentioned, this is probably not surprising, because the library is intended to support a depth of features not supported by other libraries.
If all you want is data-reading capabilities, you can take a look at a library I maintain: Sylvan.Data.Excel. It is very easy to use, and is the fastest Excel data reader library for .NET. In this specific benchmark Sylvan is more than 10x faster than ClosedXML, and uses a tiny fraction of the memory.