I have a process that archives MongoDb collections by getting an IAsyncCursor and writing the raw bytes out to an Azure Blob stream. This seems to be quite efficient and works. Here is the working code.
var cursor = await clientDb.GetCollection<RawBsonDocument>(collectionPath).Find(new BsonDocument()).ToCursorAsync();
while (cursor.MoveNext())
foreach (var document in cursor.Current)
{
var bytes = new byte[document.Slice.Length];
document.Slice.GetBytes(0, bytes, 0, document.Slice.Length);
blobStream.Write(bytes, 0, bytes.Length);
}
However, in order to move this data from the archive back into MongoDb, the only way I've figured out how to do it is to load the entire raw byte array into a memory stream and then .InsertOneAsync()
in to MongoDb. This does work fine for smaller collections, but for very large collections I'm getting MongoDb errors. Also, this obviously isn't very memory efficient. Is there any way to stream raw byte data into MongoDb, or use a cursor like I'm doing on the read?
var rawRef = clientDb.GetCollection<RawBsonDocument>(collectionPath);
using (var ms = new MemoryStream())
{
await stream.CopyToAsync(ms);
var bytes = ms.ToArray();
var rawBson = new RawBsonDocument(bytes);
await rawRef.InsertOneAsync(rawBson);
}
Here is the error I get if the collection is too large.
MongoDB.Driver.MongoConnectionException : An exception occurred while sending a message to the server.
---- System.IO.IOException : Unable to write data to the transport connection: An established connection was aborted by the software in your host machine..
-------- System.Net.Sockets.SocketException : An established connection was aborted by the software in your host machine.
CodePudding user response:
Instead of copying the stream as a whole to a byte-Array and parsing this to a RawBsonDocument
, you can parse the documents one by one, e.g.:
while (stream.Position < stream.Length)
{
var rawBson = BsonSerializer.Deserialize<RawBsonDocument>(stream);
await rawRef.InsertOneAsync(rawBson);
}
The stream will be read in chunks of one. Above sample inserts the documents directly into the database. If you want to insert in batches, you can collect a reasonable amount of documents in a list and use InsertManyAsync
.