Electron+Vue开发爬虫客户端2-自动下载网页文件

前言

使用Electron开发的时候常见有两种方式

  1. 使用VUE脚手架创建项目后添加vue-cli-plugin-electron-builder插件来构建Vue单页项目
  2. 直接加载本地HTML,HTML中可以引用vue.js

两种方式各有利弊,第一种方式适合单窗口项目,如果使用了多个窗口,每个窗口都加载了整个VUE项目,因为VUE脚手架创建的是单页项目,对应的路由,VUEX都需要重新初始化。但是可以轻松把前端项目打包。如果是单独的小应用还是建议用加载HTML的方式。

当我们要自动下载网站文件的时候,很多网站需要用户认证的,认证的信息保留在Cookie中,如果我们使用electron+vue来做(vue使用脚手架搭建),那么请求会自动携带webview中登录的cookie,但是如果是直接加载的本地html就需要自己获取cookie后添加后请求,为了获取加载网页中的元素,我们要注入preload.js,但是如果是electron+vue,preload.js的路径要用file://协议来引用,不能用相对路径来引用,所以我在两种方式都测试后选择了直接引用本地html的方式了。

注意

本文基于Electron9.0版本,10以后的版本remote组件废弃了,需要单独引用。

创建项目

尽量用图形化界面创建项目 安装插件也方便

1
vue ui

安装插件

vue-cli-plugin-electron-builder

插件官网地址: https://nklayman.github.io/vue-cli-plugin-electron-builder/

Choose Electron Version选择默认即可

运行报错

INFO Launching Electron…

Failed to fetch extension, trying 4 more times

Failed to fetch extension, trying 3 more times

Failed to fetch extension, trying 2 more times

Failed to fetch extension, trying 1 more times

Failed to fetch extension, trying 0 more times
Vue Devtools failed to install: Error: net::ERR_CONNECTION_TIMED_OUT

这是因为Devtools的安装需要翻墙

注释掉src/background.js中的以下代码就行了

1
2
3
4
5
6
7
8
if (isDevelopment && !process.env.IS_TEST) {
// Install Vue Devtools
try {
await installVueDevtools();
} catch (e) {
console.error("Vue Devtools failed to install:", e.toString());
}
}

调试框分离

1
win.webContents.openDevTools({mode:'detach'});

安装本地Dev-Tools插件

插件下载地址

链接:https://pan.baidu.com/s/19BzaBnZsWZxN_thHHvSYBw
提取码:psvm

插件放在项目根目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
app.on("ready", async () => {
if (isDevelopment && !process.env.IS_TEST) {
try {
// 安装vue-devtools
const { session } = require("electron");
session.defaultSession.loadExtension(
path.resolve(__dirname, "../vue-devtools") //这个是插件目录
);
} catch (e) {
console.error("Vue Devtools failed to install:", e.toString());
}
}
createWindow();
});

隐藏菜单栏

1
2
import { Menu } from "electron";
Menu.setApplicationMenu(null);

设置标题

窗口加载页面之前用的是窗口的title,加载之后用的是页面的title,所以最好两处都设置。

窗口的标题

1
2
3
4
5
6
7
8
9
10
11
12
const win = new BrowserWindow({
width: 1200,
height: 600,
title: "新华字典爬虫",
webPreferences: {
webviewTag: true,
webSecurity: false,
enableRemoteModule: true,
nodeIntegration: true,
contextIsolation: false,
},
});

页面的标题

1
2
3
4
5
6
7
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width,initial-scale=1.0">
<link rel="icon" href="<%= BASE_URL %>favicon.ico">
<title>新华字典爬虫</title>
</head>

加载页面

官方文档:https://www.electronjs.org/docs/api/webview-tag

页面添加webview

1
2
3
4
5
6
<webview
ref="mwv"
class="webview"
nodeintegration
disablewebsecurity
></webview>

配置中开启webview标签

1
2
3
4
5
6
7
8
9
10
11
const win = new BrowserWindow({
width: 1200,
height: 600,
webPreferences: {
webviewTag: true,
webSecurity: false,
enableRemoteModule: true,
nodeIntegration: true,
contextIsolation: false,
}
});

Preload

注意

  1. Electron-Vue项目在运行时页面是以URL加载的,那么加载preload.js就必须用file://协议加载
  2. 一定要先设置preload再打开页面,当然同时设置也是可以的
  3. preload.js方法返回的数据不能是DOM,否则报错

渲染页面中获取

文件放在public下

加载要调用的JS

file:///Users/zhangjian/psvmc/app/me/web/91crawler2/public/mypreload.js

mypreload.js

文件放在了项目根目录的public文件夹下

1
2
3
4
5
window.showData = function () {
const a_arr = document.getElementsByTagName("a");
console.info(a_arr);
return JSON.stringify(a_arr);//不能返回a_arr,DOM数组不能通过ipc传输
};

加载js和网页

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
const { app } = window.require("electron").remote;
export default {
data() {
return {
weburl: "https://zidian.aies.cn/pinyin_index.htm",
preload_path: "",
url_arr: [],
};
},
name: "Home",
components: {},
mounted() {
this.preloadaction();
this.openUrl();
},
methods: {
preloadaction: function () {
const path = require("path");
let appPathNew = app.getAppPath().split("\\").join("/");
this.preload_path = path.join(appPathNew, "..", "public", "mypreload.js");
const mwv = this.$refs["mwv"];
mwv.preload = "file://" + this.preload_path;
},
openUrl: function () {
const mwv = this.$refs["mwv"];
mwv.src = this.weburl;
mwv.addEventListener("dom-ready", () => {
mwv.openDevTools();
});
},
readdata() {
const mwv = this.$refs["mwv"];
mwv.executeJavaScript("showData();").then(function (data) {
console.info(data);
});
},
},
};

调用其中的方法获取返回数据

1
2
3
4
5
6
7
myfun: function() {
var that = this;
const mwv = this.$refs["mwv"];
mwv.executeJavaScript("showData();").then(function(data) {
console.info("data",data);
});
}

主进程设置

mypreload.js

文件放在了项目根目录的public文件夹下

1
2
3
4
5
6
7
8
9
10
11
function createWindow () {
global.shareObject = {
preload_path: path.join(
__dirname,
"..",
"public",
"mypreload.js"
),
};
mainWindow = new BrowserWindow(...)
}

渲染页面

1
this.preload_path = remote.getGlobal("shareObject").preload_path;

打包设置

如果没使用vue-cli-plugin-electron-builder,就要添加打包插件与配置

electron-builder

1
npm install electron-builder --save-dev

package.json中做如下配置

1
2
3
4
5
6
7
8
9
10
11
12
"build": {
"appId": "com.xxx.app",
"mac": {
"target": ["dmg","zip"]
},
"win": {
"target": ["nsis","zip"]
}
},
"scripts": {
"dist": "electron-builder --win --x64"
},

打包注意事项

  • 打包win环境下nsis需要图标为ico格式。
  • 图标大小最好512*512 打包mac要求。
  • 打包nsis的LICENSE.txt不能为空,文件编码为GBK,否则乱码。

依赖无法下载

手动下载放到对应位置

Mac

1
cd ~/Library/Caches/electron-builder

Linux

1
cd ~/.cache/electron-builder

Windows

1
cd %LOCALAPPDATA%\electron-builder\cache

相关下载地址

链接: https://pan.baidu.com/s/15YnlnLIrt_16Xvrmo3LLMQ 密码: 0hk3

下载的文件

nsis-3.0.4.1.7z

nsis-resources-3.4.1.7z

winCodeSign-2.6.0.7z

wine-4.0.1-mac.7z

最终目录结构

  • nsis

    • nsis-3.0.4.1
    • nsis-resources-3.4.1
  • winCodeSign

    • winCodeSign-2.6.0
  • wine
    • wine-4.0.1-mac

获取页面Cookie

页面

1
2
3
4
5
6
7
<webview
ref="mwv"
class="webview"
partition="persist:psvmc"
nodeintegration
disablewebsecurity
></webview>

JS

1
2
3
4
5
6
7
8
const { session } = window.require("electron").remote;

var ses = session.fromPartition("persist:psvmc");
ses.cookies
.get({ url: "http://www.psvmc.cn" })
.then(function(cookies) {
console.log(cookies);
});

也可以使用默认session

1
2
3
4
5
6
<webview
ref="mwv"
class="webview"
nodeintegration
disablewebsecurity
></webview>

js

1
2
3
4
5
6
7
8
const { session } = window.require("electron").remote;

var ses = session.defaultSession;
ses.cookies
.get({ url: "http://www.psvmc.cn" })
.then(function(cookies) {
console.log(cookies);
});

注意

webview和外层BrowserWindow内是可以共享session和cookie的。

弹窗选择文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
selectFile: function() {
const that = this;
dialog
.showOpenDialog({
properties: ["openFile", "openDirectory"]
})
.then(result => {
if (!result.canceled) {
that.outpath = result.filePaths[0];
}
})
.catch(err => {
console.log(err);
});
},

下载文件

下载文件有两种方式

方式1 调用浏览器下载

1
2
3
downloadfileByUrl: function(murl) {
session.defaultSession.downloadURL(murl);
},

监听下载进度方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
session.defaultSession.on("will-download", (event, item) => {
const filePath = path.join(
app.getPath("documents"),
item.getFilename()
);
console.info("即将下载:", filePath);
// item.setSavePath(filePath);
item.on("updated", (event, state) => {
if (state === "interrupted") {
console.log("Download is interrupted but can be resumed");
} else if (state === "progressing") {
if (item) {
const dp = (item.getReceivedBytes() * 100) / item.getTotalBytes();
console.log(`${item.getFilename()}: ${dp}%`);
}
}
});
item.once("done", (event, state) => {
if (state === "completed") {
console.log("下载完成");
} else {
console.log(`下载失败: ${state}`);
}
});
});

官方说的设置下载位置后就不会弹出选择下载位置弹窗,但是在渲染进程中并不生效(补充:在主进程中有效)

1
item.setSavePath(filePath);

优缺点

这种方式能保证下载文件名称中文不会乱码,但是官方给出的取消默认的下载行为再手动下载的方式行不通,后来发现是在渲染层的session的will-download中不能下载行为或者是取消弹窗,但是在主进程里是可以的。

也就是说渲染进程中可以获取下载进度但是没法设置下载位置,

所以在下载地址需要重定向获取的前提下可行的方案有

  • 在主线程中设置文件保存的位置,渲染进程中获取文件的下载进度。
  • 主线程获取真正的下载地址后调用event.preventDefault();取消默认的下载,手动用NodeJS下载。

主进程中的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const path = require("path");
win.webContents.session.on("will-download", (event, item) => {
const filePath = path.join(app.getPath("downloads"), item.getFilename());
item.setSavePath(filePath);

item.on("updated", (event, state) => {
if (state === "interrupted") {
console.log("Download is interrupted but can be resumed");
} else if (state === "progressing") {
if (item.isPaused()) {
console.log("Download is paused");
} else {
console.log(`Received bytes: ${item.getReceivedBytes()}`);
}
}
});
item.once("done", (event, state) => {
if (state === "completed") {
console.log("Download successfully");
} else {
console.log(`Download failed: ${state}`);
}
});
});

获取文件下载路径后取消下载,把下载地址发送到渲染进程中

1
2
3
4
5
win.webContents.session.on("will-download", (event, item) => {
let fileURL = item.getURL();
let fileName = item.getFilename();
event.preventDefault();
});

那会不会是session对象不一致呢

1
2
3
4
5
6
7
8
9
const { remote } = window.require("electron");
let webContent = remote.getCurrentWebContents();
webContent.session.on("will-download", (event, item) => {
const filePath = path.join(
app.getPath("downloads"),
item.getFilename()
);
item.setSavePath(filePath);
});

在渲染进程中获取webContent.session进行监听,回调中设置存储位置依旧会出现选择下载位置的弹窗,所以

event.preventDefault();item.setSavePath(filePath);只能在主进程中生效。

方式2 使用NodeJS下载

目前我使用的就是这种方式,推荐使用。

但是如果使用加载静态页面加载到window中的页面无法共享webview中的cookie

对于下载文件地址会重定向,所以使用了follow-redirects这个库。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
downloadfileByUrl: function(murl) {
const fs = window.require("fs");
const iconv = require("iconv-lite");
const url = require("url");
const http = require("follow-redirects").http;
const options = url.parse(murl);

const request = http.request(options, response => {
let filename_all = response.headers["content-disposition"];
const file_length = response.headers["content-length"];
let downd_length = 0;
if (filename_all) {
let buffer = iconv.encode(filename_all, "iso-8859-1");
filename_all = iconv.decode(buffer, "utf8");
console.info(filename_all);
let filename = filename_all.split('"')[1];
const filepath = app.getPath("downloads") + "/" + filename;
console.info(filepath);
if (fs.existsSync(filepath)) {
fs.unlinkSync(filepath);
}
let wstream = fs.createWriteStream(filepath);
response.on("data", chunk => {
downd_length += chunk.length;
let down_progress = Math.ceil((downd_length * 100) / file_length) +
"%";
console.info(
"下载进度:" + down_progress
);

wstream.write(chunk);
});
response.on("end", function() {
wstream.end();
console.info(filename + "下载完成");
});
}
});
request.end();
},

注意

下载文件时要用流式下载

优缺点

这种方式能够完全自己管控下载的位置及流程,但是要注意文件名乱码问题及直接加载HTML方式请求未携带cookie的问题。

文件名乱码解决方式

NodeJS获取content-disposition中的文件名中文乱码的解决方法

1
2
3
const iconv = require("iconv-lite");      
let buffer = iconv.encode(filename_all, "iso-8859-1");
filename_all = iconv.decode(buffer, "utf8");

设置Cookie

如果Electron加载本地静态页面中请求是无法携带Cookie,就需要我们自己填上Cookie的头

1
2
3
4
5
6
7
8
9
10
getcookie: function () {
let that = this;
const ses = session.defaultSession;
ses.cookies
.get({url: "http://www.91taoke.com"})
.then(function (cookies) {
console.log(cookies);
that.mcookie = cookies;
});
}

下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
downloadfileByUrl: function (murl, callback) {
let that = this;
const fs = window.require("fs");
const iconv = require("iconv-lite");
const url = require("url");
const http = require("follow-redirects").http;
const options = url.parse(murl);
let mcookie = this.mcookie;
let cookieStr = "";
for (const mc of mcookie) {
cookieStr += mc.name + "=" + mc.value + ";"
}
options.headers = {
'Cookie': cookieStr,
'Accept': '/ ',
'Connection': 'keep-alive'
}

const request = http.request(options, response => {
let filename_all = response.headers["content-disposition"];
const file_length = response.headers["content-length"];
let downd_length = 0;
if (filename_all) {
let buffer = iconv.encode(filename_all, "iso-8859-1");
filename_all = iconv.decode(buffer, "utf8");
console.info(filename_all);
let filename = filename_all.split('"')[1];
const filepath = app.getPath("downloads") + "/" + filename;
console.info(filepath);
if (fs.existsSync(filepath)) {
fs.unlinkSync(filepath);
}
let wstream = fs.createWriteStream(filepath);
response.on("data", chunk => {
downd_length += chunk.length;
let down_progress = Math.ceil((downd_length * 100) / file_length) +
"%";
console.info(
"下载进度:" + down_progress
);

that.download_state.down_progress = down_progress;
wstream.write(chunk);
});
response.on("end", function () {
wstream.end();
console.info(filename + "下载完成");
if (callback) {
callback();
}
});
} else {
that.download_state.down_progress = "0%";
that.$nextTick(function () {
window.alert("无权下载:" + that.downfilename);
});
}
});
request.end();
},

方式3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function strToBlob(str, type) {
return new Blob([str], {
type: type,
});
}

function download(){
const blob = strToBlob("码客说", "text/plain");
let link = document.createElement("a");
link.href = URL.createObjectURL(new Blob([blob], { type: "text/plain" }));
link.download = "test.txt";
document.body.appendChild(link);
link.click();
URL.revokeObjectURL(link.href);
}

base64 => Blob

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const base64toBlob = (base64Data, contentType, sliceSize) => {
const byteCharacters = atob(base64Data);
const byteArrays = [];

for (let offset = 0; offset < byteCharacters.length; offset += sliceSize) {
const slice = byteCharacters.slice(offset, offset + sliceSize);

const byteNumbers = new Array(slice.length);
for (let i = 0; i < slice.length; i++) {
byteNumbers[i] = slice.charCodeAt(i);
}

const byteArray = new Uint8Array(byteNumbers);
byteArrays.push(byteArray);
}

const blob = new Blob(byteArrays, {type: contentType});
return blob;
}

加载项目下文件

所有能获取项目路径的方法

1
2
3
4
console.info(app.getAppPath());
console.info(__dirname);
console.info(process.execPath);
console.info(process.cwd());

从结果中可以看出

process.execPath 的结果一直都是 browser

process.cwd() 的结果一直都是 /

所以这两个对我们没用

__dirname开发状态下获取的路径不能用,打包之后这个获取的都是项目下的路径类似于项目目录/resources/app.asar,考虑到同时兼容开发和打包两种情况下也不能采用这个

唯一对我们有用的就是app.getAppPath(),但是Windows和Mac环境下的斜杠方向是不一致的,路径拼接依旧有问题

比如我要加载的js在项目的根目录下的public/mypreload.js

代码中就可以按下面的方法获取到绝对路径

1
2
3
4
5
6
7
const path = require("path");
this.preload_path = path.join(
app.getAppPath(),
"..",
"public",
"mypreload.js"
);

但是这样在Windows环境斜杠的方向是正好相反的path.join([...paths])的时候一直是错误的,用path.normalize(path)也不行,并且官网上path.sep说的在不同平台上路径片段分隔符也是无效的。

1
2
3
4
5
6
const path = require("path");
let appPathNew = app
.getAppPath()
.split("\\")
.join("/");
this.preload_path = path.join(appPathNew, "..", "public", "mypreload.js");

还要在配置里设置忽略

项目下的vue.config.js文件中配置,没有该文件的话新建即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
module.exports = {
pluginOptions: {
electronBuilder: {
builderOptions: {
appId: "cn.psvmc.crawler",
productName: "91资源下载器",
icon: "./app.ico",
extraResources: {
from: "./public/mypreload.js",
to: "./public/mypreload.js"
},
asar: true,
mac: {
icon: "./app.ico",
target: ["zip", "dmg"]
},
win: {
icon: "./app.ico",
target: ["zip", "nsis"]
},
nsis: {
oneClick: false,
allowElevation: true,
allowToChangeInstallationDirectory: true,
installerIcon: "./app.ico",
uninstallerIcon: "./app.ico",
installerHeaderIcon: "./app.ico",
createDesktopShortcut: true,
createStartMenuShortcut: true,
license: "./LICENSE.txt"
}
}
}
}
};

关键代码

1
2
3
4
extraResources: {
from: "./public/mypreload.js",
to: "./public/mypreload.js"
},

path.join()和path.resolve()的区别

引用

1
const path = window.require('path');

对比

  • path.join()会根据平台自动拼接地址
  • path.resolve()相当于一直在执行cd,如果路径中包含/开始就相当于到了根目录

remote失效

Electron12中 remote失效

NOTE: @electron/remote requires Electron 10 or higher.

The remote module is deprecated. Instead of remote, use ipcRenderer and ipcMain.

Read more about why the remote module is deprecated here.

If you still want to use remote despite the performance and security concerns, see @electron/remote.

使用方式

下载依赖

1
$ npm install --save @electron/remote

在主进程中初始化

1
2
// in the main process:
require('@electron/remote/main').initialize()

渲染进程中替换代码

1
2
3
4
5
6
7
8
// in the renderer process:

// Before
const { BrowserWindow } = require('electron').remote

// After

const { BrowserWindow } = require('@electron/remote')

数据请求

1
npm install axios

引用

1
import axios from "axios";

请求

1
npm i qs -S

请求

1
2
3
4
5
6
7
8
9
import qs from 'qs';
const data = { phone:'edward' , password:'25'}; // 我们传的是 js 对象
const options = {
method: 'POST',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
data: qs.stringify(data), // 用 qs 将js对象转换为字符串 'name=edward&age=25'
url: 'http://www.psvmc.cn'
};
axios(options);

或者直接这样就行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
const config = {
headers: {
APPID: this.APPID,
CLIENTID: this.CLIENTID,
CLIENTSECRET: this.CLIENTSECRET,
},
};
let res = await axios.post(
"https://www.psvmc.cn/api/users/sign_in",
qs.stringify({
phone,
password,
}),
config
);

let result = res.data;
if (result && result.success) {
this.ACCESSTOKEN = result.data.token;
}

请求跨域解决

1
2
3
webPreferences: {
webSecurity: false,
},

附录

package.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
"name": "cidian",
"version": "0.1.0",
"private": true,
"scripts": {
"electron:build": "vue-cli-service electron:build",
"electron:serve": "vue-cli-service electron:serve"
},
"main": "background.js",
"dependencies": {
"async": "^3.2.0",
"cheerio": "^1.0.0-rc.3",
"core-js": "^3.6.5",
"fs-extra": "^9.0.1",
"md5": "^2.3.0",
"node-fetch": "^2.6.1",
"readline": "^1.3.0",
"request": "^2.88.2",
"superagent": "^6.1.0",
"vue": "^2.6.11"
},
"devDependencies": {
"@vue/cli-plugin-babel": "~4.5.0",
"@vue/cli-plugin-eslint": "~4.5.0",
"@vue/cli-service": "~4.5.0",
"@vue/eslint-config-prettier": "^6.0.0",
"babel-eslint": "^10.1.0",
"electron": "9.0.0",
"eslint": "^6.7.2",
"eslint-plugin-prettier": "^3.1.3",
"eslint-plugin-vue": "^6.2.2",
"less": "^3.0.4",
"less-loader": "^5.0.0",
"prettier": "^1.19.1",
"vue-cli-plugin-electron-builder": "^2.0.0-rc.5",
"vue-template-compiler": "^2.6.11"
},
"eslintConfig": {
"root": true,
"env": {
"node": true
},
"extends": [
"plugin:vue/essential",
"eslint:recommended",
"@vue/prettier"
],
"parserOptions": {
"parser": "babel-eslint"
},
"rules": {}
},
"browserslist": [
"> 1%",
"last 2 versions",
"not dead"
]
}