Home > Mobile >  How can I optimize reading many excel files to dataTable
How can I optimize reading many excel files to dataTable

Time:06-03

How to speed up this methods? It takes 10 seconds to read 5 files in given directory.I used this code for reading and merge it in one big excel file. When I add up to 100 files it takes about 10 minutes. Is there any tip to make it faster? Thanks in advance.

I used Microsoft.Office.Interop.Excel and ClosedXML.Excel.

Here's a sample of my code:

private DataTable DoExcel(string filePath)    //method to read files
{
    DataTable dt = new DataTable();
    _Application excel = new _Excel.Application();
    Workbook wb = excel.Application.Workbooks.Add();

    try
    {      
        wb = excel.Workbooks.Open(filePath);
        Worksheet ws = (_Excel.Worksheet)wb.Worksheets.get_Item(1);
        var Range = ws.UsedRange;

        object[,] data = Range.Value2;
        for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt  )
        {
            var Column = new DataColumn();
            Column.DataType = System.Type.GetType("System.String");
            Column.ColumnName = cCnt.ToString();
            dt.Columns.Add(Column);

            for (int rCnt = 3; rCnt <= Range.Rows.Count; rCnt  )
            {
                string cellVal = String.Empty;
                try
                {
                    cellVal = (data[rCnt, cCnt]).ToString();
                }
                catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
                {
                    var ConvertVal = (double)(data[rCnt, cCnt]);
                    cellVal = ConvertVal.ToString();
                }
                DataRow row;
                if (cCnt == 1)
                {
                    row = dt.NewRow();
                    row[cCnt.ToString()] = cellVal;
                    dt.Rows.Add(row);
                }
                else
                { 
                     row = dt.Rows[rCnt - 3];
                     row[cCnt.ToString()] = cellVal;
                }//else
             }//for
         }
         wb.Close(0);
         excel.Quit();
     }
     catch (Exception ex)
     {
         wb.Close(0);
         excel.Quit();
     }
     return dt;
}

public void MergeExcel()             ////method to merge files
{
    try
    {
        DataTable mainDt = new DataTable();

        foreach (var file in System.IO.Directory.GetFiles(Sciezki.folderZPlikami))
        {
            var dt = DoExcel(file);
            mainDt.Merge(dt);
        }   

        using (var workbook = new XLWorkbook())
        {
            var worksheet = workbook.Worksheets.Add(mainDt, "AllData");
            workbook.SaveAs(Sciezki.folderKoncowy   " "   Sciezki.dataZapisu   ".xlsx");
            workbook.Dispose();               
        }
        MessageBox.Show("finish");
    }catch(Exception ex)
    {
    }           
}

Thanks for help

CodePudding user response:

Sorry if I am missing something, however I have to ask a few questions.

First it seems quite odd to merge the worksheets as your code does. I would think at a minimum that each imported worksheet would be created in its own worksheet and not necessarily merged with the other worksheets. Currently, this puts columns of possible different types into the same column … ? … this seems odd and can easily lead to some confusion.

Second, I have to ask “why” are you using BOTH “Interop” AND “ClosedXML”? Is there some reason you do not use a single Excel library to do as you want? It just seems odd to do this. If you are using this for non-commercial use, then I suggest you use EPPlus for the Excel library. It should be faster and easier.

Lastly, if you are stuck using Interop, then there are a few ways to speed it up. One particular “slowness” aspect of interop is using Range values in a for loop. Example you have two lines of code that loop through the columns and rows of the Excel worksheet like…

for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt  ) { …

And

for (int rCnt = 3; rCnt <= Range.Rows.Count; rCnt  ) { …

Using… Range.Columns.Count … and … Range.Rows.Count … as ending conditions in the for loop for some reason will slow things down. So, we want to “remove” the ending condition using the Range object. Therefore, a simple solution would be to create two (2) variables to be used as the ending condition in the for loop. Something like below. First create the two variables…

int colCount = Range.Columns.Count;
int rowCount = Range.Rows.Count;

Then in the for loops, use these variables as the ending condition like…

for (int cCnt = 1; cCnt <= colCount; cCnt  ) { …

And…

for (int rCnt = 3; rCnt <= rowCount; rCnt  ) { …

In my small tests, this significantly cut the time in half if not more.

If you are still having issues, please let me know and I will try my best to help. I am aware that Interop can be notoriously slow, but there are ways to speed it up.

CodePudding user response:

When I add up to 100 files it takes about 10 minutes. Is there any tip to make it faster?

With Microsoft.Office.Interop.Excel it is definitely slow. I propose do it with Microsoft Access Data Engine. Here is demo.

I assume names of your worksheets are same "Sheet1".

Let's declare a class ExcelFileCombiner.

class ExcelFileCombiner
{

}

Now we will add methods to this class. First, let's define a method which will return names of source files. This is demo, so I will use an iterator block as an example. But you should define your own code and replace it. Your code can return IEnumerable<string> or string[] or List<string>.

static IEnumerable<string> GetSourceFileNames()
{
    yield return @"C:\Users\User\Desktop\example1.xlsx";
    yield return @"C:\Users\User\Desktop\example2.xlsx";
    yield return @"C:\Users\User\Desktop\example3.xlsx";
}

Define method to get destination file name. We will combine source data into this file.

static string GetDestinationFileName()
{
    return @"C:\Users\User\Desktop\combine.xlsx";
}

Define method to generate insert command text for a DataTable.

static string CreateInsertCommandText(DataTable dataTable)
{
    var columnNames = dataTable
        .Columns.Cast<DataColumn>()
        .Select(c => c.ColumnName)
        .Aggregate((a, b) => $"{a},{b}");

    var values = dataTable
        .Columns.Cast<DataColumn>()
        .Select(c => "?")
        .Aggregate((a, b) => $"{a},{b}");

    return $"INSERT INTO [Sheet1$] ({columnNames}) VALUES ({values})";
}

Define handler an OleDbDataAdapter. We need it while inserting rows into destination file.

static private void Adapter_RowUpdating(object sender, OleDbRowUpdatingEventArgs e)
{
    e.Command.Parameters.Clear();
    e.Command.Parameters.AddRange(
    e.Row.Table.Columns.Cast<DataColumn>()
        .Select((c, i) => new OleDbParameter(c.ColumnName, e.Row.ItemArray[i]))
        .ToArray());
}

Define method to save retrieved data into destination file.

static void Save(string fileName, DataTable dataTable)
{
    var builder = new OleDbConnectionStringBuilder()
    {
        Provider = "Microsoft.ACE.OLEDB.12.0",
        DataSource = fileName
    };
    builder.Add("Extended Properties", "Excel 12.0; HDR=Yes;");

    using (var connection = new OleDbConnection(builder.ConnectionString))
    {
        string commandText = CreateInsertCommandText(dataTable);
        using (var adapter = new OleDbDataAdapter())
        {
            adapter.RowUpdating  = Adapter_RowUpdating;
            adapter.InsertCommand = new OleDbCommand(commandText, connection);
            adapter.Update(dataTable);
        }
    }
}

Define method to retrive data from an source excel file.

static DataTable GetData(string sourceFileName)
{
    var builder = new OleDbConnectionStringBuilder()
    {
        Provider = "Microsoft.ACE.OLEDB.12.0",
        DataSource = sourceFileName
    };
    builder.Add("Extended Properties", "Excel 12.0; HDR=Yes;");

    using (var connection = new OleDbConnection(builder.ConnectionString))
    {

        string commandText = $"SELECT * FROM [Sheet1$]";
        using (var adapter = new OleDbDataAdapter(commandText, connection))
        {
            // Set 'AcceptChangesDuringFill = false' 
            // because we will save all data later on
            adapter.AcceptChangesDuringFill = false;
            var dataTable = new DataTable();
            adapter.Fill(dataTable);
            return dataTable;
        }
    }
}

Finally, define method which will inetrate through names of source file, retrieve date from source file and save it into destination file.

public static void Do()
{
    string destinationFileName = GetDestinationFileName();
    foreach (var sourceFileName in GetSourceFileNames())
    {
        Save(destinationFileName, GetData(sourceFileName));
    }
}

Now, we can use this class from our application.

private void Button1_Click(object sender, EventArgs e)
{
    ExcelFileCombiner.Do();
}

I tested the code for following files

C:\Users\User\Desktop\example1.xlsx

enter image description here

C:\Users\User\Desktop\example2.xlsx

enter image description here

C:\Users\User\Desktop\example3.xlsx

enter image description here

C:\Users\User\Desktop\combine.xlsx

enter image description here

So the file combine.xlsx is empty. After running the code.

C:\Users\User\Desktop\combine.xlsx

enter image description here

  • Related