Home > Software engineering >  Way to export single sql table data(60gb) data to multiple csv files
Way to export single sql table data(60gb) data to multiple csv files

Time:10-21

I am new to data, Apology if I am not clear with my question.

My requirement to export sql table to csv file, I used cmd.ExecuteReader in c# and loaded successfully for up to 1gb tables.

But we have couple of other tables with size up to 60 gb.

Is there any efficient way to export large table data to multiple csv files?

Appreciate for any suggestions.

CodePudding user response:

You can export your table data in paginated way.You can use OFFSET, FETCH NEXT in your query

CodePudding user response:

For your comment:

Any recommendation for other that have not any key

You could simply count the number of records, stop when you reach your limitation and create a new file:

    /// <summary>
    /// Exports a full Table to CSV file(s). Each file will contain maximum number of records specified in "maxNbOfRows" parameter
    /// </summary>
    /// <param name="connection">Database connection to be used</param>
    /// <param name="tableName">Name of the table</param>
    /// <param name="outputFilepath">Filme path (ie: "c:\Export_Table{0:00000}.csv")</param>
    /// <param name="outputFileEncoding"></param>
    /// <param name="maxNbOfRows"></param>
    /// <param name="addHeaderRow"></param>
    /// <param name="separator"></param>
    /// <returns></returns>
    public static async Task<List<string>> ExportToSplittedCsv(System.Data.Common.DbConnection connection, string tableName, string outputFilepath, Encoding outputFileEncoding, long maxNbOfRows, bool addHeaderRow = true, string separator = ";")
    {
        var command = connection.CreateCommand();
        command.CommandText = $"SELECT * FROM {tableName}";
        long totalRecords = 0;
        long fileCount = 0;
        List<string> filesCreated = new List<string>();
        string filePattern = calculateFilePattern(outputFilepath);
        using (var reader = await command.ExecuteReaderAsync())
        {
            if (reader == null || !reader.HasRows)
                return filesCreated;//Table is empty
            while (reader.HasRows)
            {
                string curFileName = string.Format(filePattern,   fileCount);
                using (var writer = new System.IO.StreamWriter(curFileName, false, outputFileEncoding))
                {
                    totalRecords  = await _exportToSplittedCsv(reader, writer, maxNbOfRows,addHeaderRow,separator);
                }
                filesCreated.Add(filePattern);
            }
        }
        //You can return totalRecords or the list of files created or event the fileCount if you prefer
        return filesCreated;
    }

    /// <summary>
    /// Checks if the given output file has already a placeholder for the counter. If not, will add "{0:00000}" in the path to be able to add the file counter in the final file path...
    /// </summary>
    /// <param name="path">Original path</param>
    /// <returns></returns>
    private static string calculateFilePattern(string path)
    {
        //The path already contains the Counter placeHolder on it
        if (path.Contains("{0"))
            return path;

        int extIndex = path.LastIndexOf('.');
        if (extIndex == -1)
            return path   "{0:00000}";
        else
            return path.Substring(0, extIndex)   "{0:00000}"   path.Substring(extIndex);
    }

    /// <summary>
    /// Exports Database until reader has no more records or until maxNbOfRows is reached
    /// </summary>
    /// <param name="reader">System.Data.Common.DbDataReader used to read data from database</param>
    /// <param name="writer">Writer that will write CSV content in the current file</param>
    /// <param name="maxNbOfRows">Maximum number of rows to write on the output file</param>
    /// <param name="addHeaderRow">When true: first row in the CSV file will contain the Column names</param>
    /// <param name="separator">CSV field separator</param>
    /// <returns></returns>
    private static async Task<long> _exportToSplittedCsv(System.Data.Common.DbDataReader reader, System.IO.StreamWriter writer, long maxNbOfRows, bool addHeaderRow , string separator )
    {
        long rowCount = 0;
        if (addHeaderRow)
            await writer.WriteAsync(getColNames(reader, separator));
        while (await reader.ReadAsync() && (maxNbOfRows < 1 || rowCount < maxNbOfRows))
        {
            await writer.WriteLineAsync(buildCsvRow(reader, separator));
            rowCount  ;
        }
        await writer.FlushAsync();
        writer.Close();
        return rowCount;
    }

    /// <summary>
    ///  Builds CSV row containing the column names
    /// </summary>
    /// <param name="reader">System.Data.Common.DbDataReader used to read data from database</param>
    /// <param name="separator">CSV field separator</param>
    /// <returns></returns>
    private static string getColNames(System.Data.Common.DbDataReader reader, string separator)
    {
        var rowBuilder = new StringBuilder();
        for (int i = 0; i < reader.FieldCount; i  )
        {
            rowBuilder.Append(separator).Append(reader.GetName(i));
        }
        //We remove extra separator from the begin...
        return rowBuilder.Remove(0, 1).ToString();
    }

    /// <summary>
    /// Builds a CSV row containing a database row
    /// </summary>
    /// <param name="reader">System.Data.Common.DbDataReader used to read data from database</param>
    /// <param name="separator">CSV field separator</param>
    /// <returns></returns>
    private static string buildCsvRow(System.Data.Common.DbDataReader reader, string separator)
    {
        var rowBuilder = new StringBuilder();
        for (int i = 0; i < reader.FieldCount; i  )
        {
            //If you want to format DateTime to a specific value...
            /*switch (reader.GetDataTypeName(i))
            {
                case : //The typename changes depending of the Database type...
                    DateTime dt = reader.GetDateTime(i);
                    if (dt == null)
                        rowBuilder.Append(separator);
                    else
                        rowBuilder.Append(separator).Append(dt.ToString("yyyy-MM-dd HH:mm:ss"));
                    break;
                default:
                    rowBuilder.Append(separator).Append(reader.GetValue(i).ToString());
            }*/
            rowBuilder.Append(separator).Append(reader.GetValue(i).ToString());
        }
        //We remove extra separator from the begin...
        return rowBuilder.Remove(0, 1).ToString();
    }

This answer can be optimised with a Cancellation process. How to use it:

ExportToSplittedCsv(dbConnection,
            "customer_orders",
            "c:\\ExportedData\\customerOrders\\export_customerOrders{0:00000}.csv",
            System.Text.Encoding.Default, 60000, true, ";");
  • Related