Home > Net >  Is it possible to store PDF files in a CQL blob type in Cassandra?
Is it possible to store PDF files in a CQL blob type in Cassandra?

Time:06-16

To avoid questions about. Why do you use casandra in favour of another database. we have to because our custoner decided that Im my option a completely wrong decision.

In our Applikation we have to deal with PDF documents, i.e. Reader them and populate them with data. So my intention was to hold the documents (templates) in the database read them and then do what we need to do with them.

I noticed that cassandra provieds a blob column type. However for me it seems that this type has nothing to with a blob in qn Oracle or other relational database.

  • From what I understand is that cassandra is not for storing documnents and therefore it is not possible?
  • Or is the only way to make byte-array out of the document?
  • what is the intention of the blob column type?

CodePudding user response:

The blob type in Cassandra is used to store raw bytes, so it's "theoretically" could be used to store PDF files as well (as bytes). But there is one thing that should be taken into consideration - Cassandra doesn't work well with big payloads - usual recommendation is to store 10s or 100s of Kb, not more than 1Mb. With bigger payloads, operations, such as repair, addition/removal of nodes, etc. could lead to increased overhead and performance degradation. On older versions of Cassandra (2.x/3.0) I have seen the situations when people couldn't add new nodes because join operation failed. It's a bit better situation with newer versions, but still it should be evaluated before jumping into implementation. It's recommended to do performance testing some maintenance operations at scale to understand if it will work for your load. NoSQLBench is a great tool for such things.

CodePudding user response:

It possible to store binary files in a CQL blob column however the general recommendation is to only store a small amount of data in blobs, preferably 1MB or less for optimum performance.

For larger files, it is better to place them in an object store and only save the metadata in Cassandra.

Most large enterprises whose applications hold large amount of media files (music, video, photos, etc) typically store them in Amazon S3, Google Cloud Store or Azure Blob then store the metadata (such as URLs) of the files in Cassandra. These enterprises are household names in streaming services and social media apps. Cheers!

  • Related