I have a document with 3 million lines that I have to process and then insert into a table on a database. The database connector class have an insert and an update that receive the data and then make the operation. The insert and update work perfectly.
The problem is that I want to split this modifications to the database to avoid overloading the DB. Ideally I would love to make automatic splits of n lines of arrFinal and then manage each part separately. But I don't know how to do this.
This is the foreach I have right now:
ConnectDB cdb = new ConnectDB();
foreach (string s in arrFinal)
{
if (s.Split(';')[1] == "REQUEST")
{
cdb.Insert(s.Split(';')[0], s.Split(';')[2], s.Split(';')[3], s.Split(';')[4], s.Split(';')[5], dateLog);
}
else if (s.Split(';')[1] == "RESPONSE")
{
cdb.Update(s.Split(';')[0], s.Split(';')[5]);
}
}
In case you wonder how data comes:
00:00:00.7443;REQUEST;POST;https://ulrName/Prices/;/1_0/962;https://ulrName/Prices/1_0/962
00:00:00.7793;RESPONSE;POST;https://ulrName/Prices/;/1_0/962;https://ulrName/Prices/1_0/962
Thank you in advance for your help. I'm open to try any approach to get this right.
CodePudding user response:
Well, for starters you really do not need all the split
s - why recalculate something so many times?
ConnectDB cdb = new ConnectDB();
foreach (string s in arrFinal)
{
var data = s.Split(';')
if (data[1] == "REQUEST")
{
cdb.Insert(data[0], data[2], data[3], data[4], data[5], dateLog);
}
else if (data[1] == "RESPONSE")
{
cdb.Update(data[0], data[5]);
}
}
You could also use something like spans.
As for splitting the array:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
using var enumerator = source.GetEnumerator();
while (enumerator.MoveNext())
{
yield return YieldBatchElements(enumerator, batchSize - 1);
}
}
private static IEnumerable<T> YieldBatchElements<T>(IEnumerator<T> source, int batchSize)
{
yield return source.Current; for (var i = 0; i < batchSize && source.MoveNext(); i )
yield return source.Current;
}
To get all the results at once batched, just append .ToList()
at the end when using Batch()
extension method.
Another thing would be to try and load part of the file and process it, then another part, another etc, like AntiqTech mentioned but it would require some form of saving the current processed amount in case of app failure.
CodePudding user response:
I've finally managed to accomplish what I wanted thanks to this answer in another post:
How to split an array into chunks of specific size?
My final code goes like this:
String[][] chunks = arrFinal
.Select((s, i) => new { Value = s, Index = i })
.GroupBy(x => x.Index / 500)
.Select(grp => grp.Select(x => x.Value).ToArray())
.ToArray();
ConnectDB cdb = new ConnectDB();
for (int i = 0; i < chunks.Length; i )
{
foreach (string s in arrFinal)
{
var data = s.Split(';');
if (data[1] == "REQUEST")
{
cdb.Insert(data[0], data[2], data[3], data[4], data[5], dateLog);
}
else if (data[1] == "RESPONSE")
{
cdb.Update(data[0], data[5]);
}
}
}