Ubuntu上部署JobConverter + Ibreoffice环境

前言

本文内容主要目的在于测试Ibreoffice转换docx文档失败的原因是否和系统有关,之前我在CentOS上和MacOS上均转换不成功,但是使用一个开源的项目却可以,而他用的就是Ubuntu和Ibreoffice,抱着找到原因的心态在Ubuntu上进行测试。

实验结论

部分文档转换不成功,跟系统没有关系,猜测跟字体有关。

官网:https://zh-cn.libreoffice.org/download/libreoffice/

参数说明:https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters/zh-CN

设置APT源

如果系统为腾讯云或阿里云提供的可跳过

阿里云的源

1
2
echo "deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse\ndeb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse\ndeb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse\ndeb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse\ndeb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse\ndeb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse\ndeb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse\ndeb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse\ndeb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse\ndeb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse" > /etc/apt/sources.list &&\
apt-get clean && apt-get update

如下

1
2
3
4
5
6
7
8
9
10
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

腾讯云默认的源

1
vi /etc/apt/sources.list

如下为腾讯云默认的源:

1
2
3
4
5
6
7
8
9
10
deb http://mirrors.tencentyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.tencentyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.tencentyun.com/ubuntu/ bionic-updates main restricted universe multiverse
#deb http://mirrors.tencentyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
#deb http://mirrors.tencentyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.tencentyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.tencentyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.tencentyun.com/ubuntu/ bionic-updates main restricted universe multiverse
#deb-src http://mirrors.tencentyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
#deb-src http://mirrors.tencentyun.com/ubuntu/ bionic-backports main restricted universe multiverse

更新

1
apt-get clean && apt-get update

安装字体

默认转换中文都会显示方框

在转换中我们会发现转换的pdf和原文档字体是有差异的,是因为系统上没有我们需要的字体,所以我们要安装字体

1
2
3
apt-get install -y ttf-mscorefonts-installer &&\
apt-get install -y fontconfig &&
apt install xfonts-utils

查看现有字体

1
fc-list

打开目录

1
cd /usr/share/

我们会看到fontsfontconfig目录

添加字体

创建目录并进入

1
2
mkdir /usr/share/fonts/chinese
cd /usr/share/fonts/chinese

在Windows上找到C://Windows/Fonts下最后几列中文名称的字体都上传到/usr/share/fonts/chinese

设置目录权限

1
chmod -R 755 /usr/share/fonts/chinese

最后一步就是修改字体配置文件了,首先通过编辑器打开配置文件:

1
vi /etc/fonts/fonts.conf

可以看到一个Font list,即字体列表,在这里需要把我们添加的中文字体位置加进去:

1
<dir>/usr/share/fonts/chinese</dir>

最后别忘了刷新内存中的字体缓存,这样就不用reboot重启了:

1
2
3
4
cd /usr/share/fonts/chinese &&\
mkfontscale &&\
mkfontdir &&\
fc-cache -fv

这样所有的步骤就算完成了,最后再次通过fc-list看一下字体列表:

1
fc-list

安装Ibreoffice

用APT安装

查看

1
sudo apt-cache search libreoffice

安装

1
2
3
4
5
6
# 卸载原来的
sudo apt remove libreoffice-*
# 安装
sudo apt-get install libreoffice
# 安装中文语言包
sudo apt-get install libreoffice-l10n-zh-cn libreoffice-help-zh-cn

查看版本

1
soffice --version

显示

LibreOffice 6.0.7.3 00m0(Build:3)

查看路径

1
which soffice

显示

/usr/bin/soffice

创建目录

1
2
sudo mkdir /usr/local/office_package
cd /usr/local/office_package

转换

1
soffice --headless --convert-to pdf /usr/local/office_package/5.docx --outdir /usr/local/office_package/

注意

libreoffice和openoffice原来是一家,所以早期版本的libreoffice命令和openoffice一样,后来版本的libreoffice就变了

官方下载安装

创建目录

1
2
sudo mkdir /usr/local/office_package
cd /usr/local/office_package

下载

1
wget https://mirror-hk.koddos.net/tdf/libreoffice/stable/6.4.4/deb/x86_64/LibreOffice_6.4.4_Linux_x86-64_deb.tar.gz

解压

1
2
cd /usr/local/office_package/
tar -zxvf LibreOffice_6.4.4_Linux_x86-64_deb.tar.gz

安装

1
2
cd /usr/local/office_package/LibreOffice_6.4.4.2_Linux_x86-64_deb/DEBS/
sudo dpkg -i *.deb

下载汉化包

1
2
cd /usr/local/office_package/
wget https://download.nus.edu.sg/mirror/tdf/libreoffice/stable/6.4.4/deb/x86_64/LibreOffice_6.4.4_Linux_x86-64_deb_langpack_zh-CN.tar.gz

解压

1
tar -zxvf LibreOffice_6.4.4_Linux_x86-64_deb_langpack_zh-CN.tar.gz

进入目录安装

1
2
cd /usr/local/office_package/LibreOffice_6.4.4.2_Linux_x86-64_deb_langpack_zh-CN/DEBS/
sudo dpkg -i *.deb

查看版本

1
libreoffice6.4 --version

查看可执行文件位置

1
which libreoffice6.4

卸载

1
2
3
sudo apt remove -y libreoffice* &&\
sudo apt purge -y libreoffice* &&\
sudo apt -y autoremove

转换测试

下载doc文档

1
2
cd /usr/local/office_package/
wget http://wordupload.xhkjedu.com/resource/0/0.docx

转换参数

1
2
3
--convert-to pdf:writer_pdf_Export 1.doc
--convert-to "html:XHTML Writer File:UTF8" 1.doc
--convert-to "txt:Text (encoded):UTF8" 1.doc

可以简写为

1
2
3
--convert-to pdf 1.doc
--convert-to html 1.doc
--convert-to txt 1.doc

docx=>pdf

1
soffice  --headless --invisible --convert-to pdf /usr/local/office_package/5.docx --outdir /usr/local/office_package/

或者

1
soffice  --headless --invisible --convert-to pdf:writer_pdf_Export /usr/local/office_package/5.docx --outdir /usr/local/office_package/

或者

1
2
soffice  -env:UserInstallation=file:///$HOME/.libreoffice-headless/ \
--headless --invisible --convert-to pdf /usr/local/office_package/0.docx --outdir /usr/local/office_package/

我这里直接就成功了

docx=>jpg

1
soffice  --headless --invisible --convert-to jpg /usr/local/office_package/0.docx --outdir /usr/local/office_package/

或者

1
2
soffice  -env:UserInstallation=file:///$HOME/.libreoffice-headless/ \
--headless --invisible --convert-to jpg /usr/local/office_package/5.docx --outdir /usr/local/office_package/

docx=>html

1
2
soffice  -env:UserInstallation=file:///$HOME/.libreoffice-headless/ \
--headless --invisible --convert-to html /usr/local/office_package/5.docx --outdir /usr/local/office_package/

报错解决

问题描述:

Libreoffice发生转换不成功(比如转换wps文件),再做转换就会直接不做任何操作

当你运行其中一个LibreOffice的时候,再运行另外一个Libreoffice转换时,将不做任何操作。

导致这种问题的原因时有转换进程一直在运行,所以我们也可以杀掉进程

1
top

或者

1
top -bc |grep soffice.bin

查看卡死的进程杀死即可

1
kill -9 进程id

后端中使用

方式1(使用三方库)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<dependency>
<groupId>org.jodconverter</groupId>
<artifactId>jodconverter-core</artifactId>
<version>4.2.0</version>
</dependency>
<dependency>
<groupId>org.jodconverter</groupId>
<artifactId>jodconverter-local</artifactId>
<version>4.2.0</version>
</dependency>
<dependency>
<groupId>org.jodconverter</groupId>
<artifactId>jodconverter-spring-boot-starter</artifactId>
<version>4.2.0</version>
</dependency>
<dependency>
<groupId>org.libreoffice</groupId>
<artifactId>ridl</artifactId>
<version>5.4.2</version>
</dependency>

application.properties

1
2
3
4
5
6
jodconverter:
local:
enabled: true
office-home: /opt/libreoffice6.1
port-numbers: 8100,8101,8102
max-tasks-per-process: 100

调用

1
2
3
4
5
6
7
8
9
10
11
12
@RestController
@RequestMapping("/doc")
public class LibreOfficeController {
@Autowired
private DocumentConverter documentConverter;
@RequestMapping("/toPdf")
public void toPdf() throws OfficeException {
File wfile = new File("/usr/local/office_package/5.docx");
File pfile = new File("/usr/local/office_package/5.pdf");
documentConverter.convert(wfile).to(pfile).execute();
}
}

方式2(运行命令形式)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
* 利用libreOffice将office文档转换成pdf
* @param inputFile 目标文件地址
* @param pdfFile 输出文件夹
* @return
*/
public static boolean convertOffice2PDF(String inputFile, String pdfFile){
long start = System.currentTimeMillis();
String command;
boolean flag;
String osName = System.getProperty("os.name");
if (osName.contains("Windows")) {
command = "cmd /c start soffice --headless --invisible --convert-to pdf " + inputFile + " --outdir " + pdfFile;
}else {
command = "soffice --headless --invisible --convert-to pdf " + inputFile + " --outdir " + pdfFile;
}
flag = executeLibreOfficeCommand(command);
long end = System.currentTimeMillis();
logger.debug("用时:{} ms", end - start);
return flag;
}


/**
* 执行command指令
* @param command
* @return
*/
public static boolean executeLibreOfficeCommand(String command) {
logger.info("开始进行转化.......");
Process process;// Process可以控制该子进程的执行或获取该子进程的信息
try {
logger.debug("convertOffice2PDF cmd : {}", command);
process = Runtime.getRuntime().exec(command);// exec()方法指示Java虚拟机创建一个子进程执行指定的可执行程序,并返回与该子进程对应的Process对象实例。
// 下面两个可以获取输入输出流
// InputStream errorStream = process.getErrorStream();
// InputStream inputStream = process.getInputStream();
} catch (IOException e) {
logger.error(" convertOffice2PDF {} error", command, e);
return false;
}
int exitStatus = 0;
try {
exitStatus = process.waitFor();// 等待子进程完成再往下执行,返回值是子线程执行完毕的返回值,返回0表示正常结束
// 第二种接受返回值的方法
int i = process.exitValue(); // 接收执行完毕的返回值
logger.debug("i----" + i);
} catch (InterruptedException e) {
logger.error("InterruptedException convertOffice2PDF {}", command, e);
return false;
}
if (exitStatus != 0) {
logger.error("convertOffice2PDF cmd exitStatus {}", exitStatus);
} else {
logger.debug("convertOffice2PDF cmd exitStatus {}", exitStatus);
}
process.destroy(); // 销毁子进程
logger.info("转化结束.......");
return true;
}

方式3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>

<dependency>
<groupId>cn.keking</groupId>
<artifactId>jodconverter-core</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<artifactId>commons-io</artifactId>
<groupId>commons-io</groupId>
</exclusion>
</exclusions>
</dependency>

结论

  • 字体没有会导致转换后差异
  • WPS文件无论转为doc或docx都无法转换
  • 图片类型为嵌入型时部分转换图片丢失

安装Openoffice

快捷安装

复制字体到/usr/share/fonts/chinese

字体下载链接: https://pan.baidu.com/s/1kUNQ1Uqeu0bUczrv25HAEQ 密码: r84g

安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apt-get install -y locales && apt-get install -y language-pack-zh-hans &&\
localedef -i zh_CN -c -f UTF-8 -A /usr/share/locale/locale.alias zh_CN.UTF-8 && locale-gen zh_CN.UTF-8 &&\
apt-get install -y tzdata && ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime &&\
apt-get install -y libxrender1 &&\
apt-get install -y libxt6 &&\
apt-get install -y libxext-dev &&\
apt-get install -y libfreetype6-dev &&\
apt-get install -y ttf-mscorefonts-installer &&\
apt-get install -y fontconfig &&\
cd /tmp &&\
wget https://kkfileview.keking.cn/server-jre-8u251-linux-x64.tar.gz &&\
tar -zxf /tmp/server-jre-8u251-linux-x64.tar.gz &&\
mv /tmp/jdk1.8.0_251 /usr/local/ &&\
cd /tmp &&\
wget https://kkfileview.keking.cn/Apache_OpenOffice_4.1.6_Linux_x86-64_install-deb_zh-CN.tar.gz -cO openoffice_deb.tar.gz &&\
tar -zxf /tmp/openoffice_deb.tar.gz &&\
cd /tmp/zh-CN/DEBS &&\
dpkg -i *.deb && dpkg -i desktop-integration/openoffice4.1-debian-menus_4.1.6-9790_all.deb &&\
rm -rf /tmp/* && rm -rf /var/lib/apt/lists/* &&\
cd /usr/share/fonts/chinese &&\
mkfontscale &&\
mkfontdir &&\
fc-cache -fv

打开文件/etc/profile

1
vi /etc/profile

profile文件末尾加入:

1
2
3
export JAVA_HOME=/usr/local/jdk1.8.0_251 
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

配置立即生效

1
source /etc/profile

4.1.7版本安装

2020年6月19日最新版本

1
2
3
4
wget https://jaist.dl.sourceforge.net/project/openofficeorg.mirror/4.1.7/binaries/zh-CN/Apache_OpenOffice_4.1.7_Linux_x86-64_install-deb_zh-CN.tar.gz &&\
tar -zxf Apache_OpenOffice_4.1.7_Linux_x86-64_install-deb_zh-CN.tar.gz &&\
cd ./zh-CN/DEBS &&\
dpkg -i *.deb && dpkg -i desktop-integration/openoffice4.1-debian-menus_4.1.7-9800_all.deb

运行

1
nohup /opt/openoffice4/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &

加入开机自启动

1
vi /etc/rc.local

添加如下

nohup /opt/openoffice4/program/soffice -headless -accept=”socket,host=127.0.0.1,port=8100;urp;” -nofirststartwizard &

查看进程

1
ps -ef|grep openoffice

查看安装的包

1
dpkg -l|grep openoffice

转换

下载地址

https://sourceforge.net/projects/jodconverter/files/JODConverter/

解压进入目录下的bin目录

1
java -jar jodconverter-cli-2.2.2.jar /usr/local/office_package/5.docx /usr/local/office_package/5.pdf

卸载

1
2
3
sudo apt-get remove -y openoffice* &&\
sudo apt purge -y openoffice* &&\
sudo apt -y autoremove