I am trying to send content of word document and PDF to Apache OpenNLP. I am wondering if I can use ActiveMQ to read the MS word so that I can trigger a process to Apache Kafka to process the stream.
Any suggestion to stream the PDF or word other than ActiveMQ is welcome.
CodePudding user response:
Message queues generally shouldn't be used for file transfer. Put the files in blob storage like S3, then send the URI between clients (e.g "s3://bucket/file.txt"
), and download and process elsewhere... Other option is to use Apache POI or similar tools in the producer client to parse your files, then send that data in whatever format you want (JSON, Avro, or Protobuf, are generally used more often in streaming tools than XML)
Actual file processing has nothing to do with the queue technology used
CodePudding user response:
If you use ActiveMQ "Classic" (i.e. any 5.x version) you'll have problems moving large messages as there's no real support for that use-case. However, ActiveMQ Artemis (i.e. ActiveMQ's next-gen broker) has support for arbitrarily large messages which would facilitate your use-case. The nice thing about having large message support in the broker is that you don't have to involve some other kind of storage mechanism in your solution. That makes development and maintenance of your application and environment a bit simpler.