Home > Back-end >  Converting pages files using Libreoffice in Docker returns empty file
Converting pages files using Libreoffice in Docker returns empty file

Time:12-24

I am trying to run LibreOffice in a Docker container to convert some pages files to PDF. The application is a Web API and runs perfectly on a Windows Virtual Machine. I am new to Linux, Dockers and Containers.

I have been able to deploy everything to a container and call the API, but I am just getting an empty document back, and I have no idea why. I'm also unsure on the best way to try and debug this issue, so any advice is greatly appreciated.

Here is how I am installing Libre Office in the Docker file.

FROM mcr.microsoft.com/dotnet/aspnet:6.0 AS base
EXPOSE 80
RUN apt-get update
RUN apt-get install -y libreoffice

Here is the relevant part of my application responsible for doing the conversion.

string libreOfficeArgs = "--norestore --nofirststartwizard --headless --convert-to pdf \"{inputFile}\" --outdir \"{outputFolder}\"";
string libreOfficeExe = "/usr/bin/libreoffice";
//string libreOfficeExe = "/usr/bin/soffice"; Doesn't work either.

var conversionArgs = libreOfficeArgs.Replace("{inputFile}", inputPath).Replace("{outputFolder}", Path.GetDirectoryName(inputPath));

var conversionProcess = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = libreOfficeExe,
        Arguments = conversionArgs
    }
};

conversionProcess.Start();
await conversionProcess.WaitForExitAsync(); //TODO: Timeout?
conversionProcess.Close();

//I then read the output file into a stream and the API returns the stream

Any advice on how to investigate further or fix my problem would be greatly appreciated.

EDIT:

I can see in the logs the following so I think it could be related to how I am installing LibreOffice? As clearly the API is calling it.

convert /tmp/tmpuYq5ri.pages -> /tmp/tmpuYq5ri.pdf using filter : writer_Pdf_Export

EDIT 2:

Here is how the stream is being read.

var outputFilePath = Path.ChangeExtension(inputPath, "pdf");

var ms = new MemoryStream();
using (var fs = new FileStream(conversionOutput, FileMode.Open))
{
    await fs.CopyToAsync(ms);
    ms.Seek(0, SeekOrigin.Begin);
}

CodePudding user response:

It seems you are trying to convert a .pages file. According to this source and this bug, trying to convert a pages file in old versions of LibreOffice yields a blank document, which would explain your issue.

Try updating LibreOffice to a version where this bug is fixed by modifying your Dockerfile:

FROM mcr.microsoft.com/dotnet/aspnet:6.0 AS base
EXPOSE 80
RUN apt-get install libreoffice-java-common
ADD https://ftp.gwdg.de/pub/tdf/libreoffice/stable/7.4.3/deb/x86_64/LibreOffice_7.4.3_Linux_x86-64_deb.tar.gz .
RUN tar zxvf LibreOffice_7.4.3_Linux_x86-64_deb.tar.gz
RUN sh -c 'cd LibreOffice_7.4.3.2_Linux_x86-64_deb/DEBS && sh -c dpkg -i *.deb'

Also note the path of libreoffice is different. /opt/libreoffice7.4/program/soffice

  • Related