Preventing data loss on .Net service restart-CodePudding

From a .Net service, some metrics are being aggregated and flushed with a daily periodicity. If the service shuts down/restarts, the data lost could be ranging from a few minutes to whole day in worst case.

If I log frequently, to minimize if not prevent the loss, say per hour instead of daily, the rows logged would shoot up from three digit million rows to at least double digit billion rows per day. Trying to flush just before the service shuts down/restarts gracefully can't prevent the losses where service shuts down/restarts abruptly.

Which type of C# programming constructs or event handling are useful to log less frequently and minimize the loss to as small percentage as possible?

(I hope the question is specific/focused enough. If you think it is not, would like to discuss it in comments.)

CodePudding user response：

If you are sure that the service is closed gracefully, not the fast-and-hard way, then in .net web application you could attach something that will save your data, to an application stopping handler if there will be need. Such function would need to have access to current log state, so there would be a need for logging service to be aware of it's current logged data state.

// if IApplicationLifetime wont work, try IHostApplicationLifetime
public class Startup
{
    public void Configure(IApplicationBuilder app, IApplicationLifetime applicationLifetime)
    {
        applicationLifetime.ApplicationStopping.Register(OnShutdown);
        // Additional configurations, etc...
    }
 
    private void OnShutdown()
    {
         //this code is called when the application stops
    }
}

If the issue is with excessive database calls, which forces you to batch log entries, then I would go with the logging to a local file more often first, and then let the background service scan periodically and upload context of this file to db in batch. Again, this would require marking uploaded entries and in some scenarios, when db call was made but app crashed before marking context of the file as uploaded, then duplicated upload would be made next time. This could be averted by comparing last log entry from file with last uploaded entry, storing some marker in the database that is saved in the same transaction with log upload or by checking, lets say, timestamp of log uploaded vs log in local storage.

CodePudding user response：

Save the current agregrated value every hour and then delete the prior aggregated value from the prior hour
Save the once a day agregrated value and delete the last hourly value

A narrowing down of the data or using adjacent time de-duplication may help.

Examples,

Discard the latest data point if it is within 0.01% of the prior data point and is within the last 500 milliseconds
Keep a date time observed and end date time for each data point where it measures the time the value was first seen and then then all points observed from that date time until the end date time are within 0.1% of the value first observed. This can also time out and not let more than 30 seconds of data go into one sample.

These both depend on if your business users agree to the tolerances, loss of the number of events, sampling frequency, etc.