Converting .doc, .pdf, and .html with Libreoffice
In 2014, I was trying to make a webapp that could save documents as .pdf and .doc files. These are a series of notes, only loosely edited, of the findings I had along the way:
Option 1: Use unoconv
- Install
Libreoffice 4.3.xon the server - Install
unoconvfrom master on the server - Use
bashto convert documents (see unoconv docs)
Gotchas:
There’s a special page style for HTML documents. You have to change that page style in order to have the changes reflected when converting from HTML to other document types.
Create Libreoffice Templates! (.ott) files
- Open up the
Styles and Formattingpane (wrench icon on top left, of F11) - Go to the
Page Stylestab - Change the HTML style to reflect what the page should look like with HTML
- Save as a template (.ott) file
Option 2: Use soffice
- Install
Libreoffice 4.3.xon the server (where the app is running) - Point to
libreoffice-install/program/soffice(either exe for Windows, or the executable soffice for unix) via the PATH or your.bashrcfile - Use
bashto convert documents
Bash commands:
soffice --headless --convert-to [document type] --output-dir [output directory] [file to convert]
document typethe file format to convert to.pdf,doc,docx,html, etc. are all fair game.output directorythe directory to output to. If left blank, will output to the current working directory.file to convertthe file to convert. TODO: figure out how to convert from a stdin buffer
As an example:
soffice --headless --convert-to docx --output-dir /doc index.html