前言

本文使用的Python版本为3.7

之前使用KNN做字符识别，效果不是特别好，本文尝试使用CNN进行字符识别。

https://www.psvmc.cn/article/2024-09-05-opencv-num-recognize-knn-python.html

经过测试

通过CNN识别的效果要比KNN好一点，但是识别的相对慢一点。

相同素材训练的模型KNN比CNN的大。

附：阿里云识别

后文

在做简单字符的识别的前提下，CNN并没有相比KNN有太大的优势，但是无论训练还是识别却慢不少，所以最终还是使用的KNN。

字符识别图片尺寸推荐

常见的简单字符识别任务通常使用以下几种尺寸：

28x28像素：这是MNIST手写数字数据集的标准尺寸，适合简单字符识别任务。
32x32像素：CIFAR-10数据集中的图片尺寸，适用于需要稍多细节的简单字符识别任务。
64x64像素：适用于需要更多细节或者更复杂字符的识别任务。

添加依赖

项目目录下初始化

1	pipenv install

为了实现字符识别任务，你需要安装以下依赖库。

这些库包括用于图像处理的OpenCV、深度学习框架TensorFlow、以及数据处理库NumPy和Scikit-learn。

说明

OpenCV是一个开源的计算机视觉库，提供了丰富的图像处理功能。
TensorFlow是一个流行的深度学习框架，用于构建和训练神经网络模型。
NumPy是一个用于科学计算的基础库，提供了高性能的多维数组对象和用于处理这些数组的工具。
Scikit-learn是一个用于机器学习的库，提供了数据分割、模型评估等功能。

安装依赖

一定要注意版本

1
2
3

pipenv install opencv-python==4.5.4.60 
pipenv install tensorflow-cpu==2.3.0
pipenv install scikit-learn==0.24.2

不用的时候卸载

1 2	pipenv uninstall tensorflow-cpu pipenv uninstall scikit-learn

GPU版本的TensorFlow

1	pipenv install tensorflow==2.3.0

或者编辑Pipfile文件

[[source]]
url = "https://pypi.tuna.tsinghua.edu.cn/simple"
verify_ssl = true
name = "pip_conf_index_global"

[packages]
opencv-python = "==4.5.4.60"
tensorflow-cpu = "==2.3.0"
scikit-learn = "==0.24.2"

[dev-packages]

[requires]
python_version = "3.7"

修改后执行安装

1	pipenv install

安装确认

安装完成后，你可以在Python环境中确认这些库是否正确安装：

import cv2
import numpy as np
import sklearn
import tensorflow as tf

# 确认OpenCV安装
print("OpenCV", cv2.__version__)

# 确认NumPy安装
print("Numpy", np.__version__)

# 确认TensorFlow安装
print("Tensorflow", tf.__version__)

# 确认Scikit-learn安装
print("Sklearn", sklearn.__version__)

模型训练

工具类

cnn_train.py

import csv
import json
import os
from pathlib import Path

import cv2
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, models

from utils.cv_common_utils import CvCommonUtils

script_path = Path(__file__).resolve()
project_path = str(script_path.parent.parent.parent)
tsv_path = os.path.join(project_path, "ml-resource", "ml", "Word.tsv")
model_filepath = os.path.join(project_path, "ml-resource", "cnn_model.h5")
json_filepath = os.path.join(project_path, "ml-resource", 'cnn_label_map.json')
img_size = 32


def preprocess_image(image):
    if image.size != (img_size, img_size):
        image = cv2.resize(image, (img_size, img_size))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, image = cv2.threshold(image, 210, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
    image = image / 255.0
    return image


def load_data():
    images = []
    labels = []
    # 打开 TSV 文件
    with open(tsv_path, 'r', newline='', encoding='utf-8') as file:
        reader = csv.reader(file, delimiter='\t')
        # 读取数据
        for row in reader:
            if len(row) >= 2:
                image_path = row[0]
                image = CvCommonUtils.read_img(image_path)
                processed_image = preprocess_image(image)
                images.append(processed_image)
                labels.append(row[1])
        return np.array(images), np.array(labels)


images, labels = load_data()

label_to_int = {label: i for i, label in enumerate(np.unique(labels))}
int_labels = np.array([label_to_int[label] for label in labels])

X_train, X_test, y_train, y_test = train_test_split(
    images,
    int_labels,
    test_size=0.2,
    random_state=42
)

X_train = X_train.reshape(-1, img_size, img_size, 1)
X_test = X_test.reshape(-1, img_size, img_size, 1)

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(img_size, img_size, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(len(label_to_int), activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc}')

# 保存模型
model.save(model_filepath)

# 保存映射字典
int_to_label = {v: k for k, v in label_to_int.items()}
with open(json_filepath, 'w', encoding='utf-8') as file:
    json.dump(int_to_label, file, ensure_ascii=False, indent=4)

代码解释：

数据分割

1	X_train, X_test, y_train, y_test = train_test_split(images, int_labels, test_size=0.2, random_state=42)

方法及参数：

train_test_split 是 scikit-learn 库中的一个函数，用于将数据集分割为训练集和测试集。
images 是输入图像数据，int_labels 是对应的标签数据。
test_size=0.2 表示将数据集的 20% 作为测试集，剩下的 80% 作为训练集。
random_state=42 用于设置随机种子，确保每次运行代码时数据分割的结果都是一致的。

数据重塑

1 2	X_train = X_train.reshape(-1, 64, 64, 1) X_test = X_test.reshape(-1, 64, 64, 1)

方法及参数：

reshape 方法用于改变数组的形状。
-1 表示自动计算该维度的大小，以使总元素数量保持不变。
64, 64 表示图像的宽度和高度。
1 表示图像的通道数，这里假设图像为灰度图像（单通道）。

模型定义

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(len(label_to_int), activation='softmax')
])

方法及参数：

models.Sequential 是一个用于构建顺序模型的类，即一层一层堆叠的模型。
第一层：
1
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1))
- Conv2D 表示二维卷积层。
- 32 是卷积核的数量。
- (3, 3) 是卷积核的大小。
- activation='relu' 表示使用 ReLU 激活函数。
- input_shape=(64, 64, 1) 表示输入图像的形状为 64x64 像素，单通道。
第二层：
1
layers.MaxPooling2D((2, 2))
- MaxPooling2D 表示二维最大池化层。
- (2, 2) 表示池化核的大小。
第三层：
1
layers.Conv2D(64, (3, 3), activation='relu')
- 类似第一层，但卷积核的数量增加到 64。
第四层：
1
layers.MaxPooling2D((2, 2))
- 类似第二层。

第五层：

1	layers.Conv2D(64, (3, 3), activation='relu')

类似第三层。

第六层：
1
layers.Flatten()
Flatten 层用于将多维输入展平为一维。
第七层：
1
layers.Dense(64, activation='relu')
- Dense 表示全连接层。
- 64 是输出维度。
- activation='relu' 表示使用 ReLU 激活函数。
第八层：
1
layers.Dense(len(label_to_int), activation='softmax')
- Dense 层用作输出层。
- len(label_to_int) 是输出类别的数量。
- activation='softmax' 表示使用 softmax 激活函数，用于多分类问题。

字符识别

工具类

cnn_recognition.py

import json
import os
from pathlib import Path

import cv2
import numpy as np
from tensorflow.keras import models

from utils.cv_common_utils import CvCommonUtils

script_path = Path(__file__).resolve()
project_path = str(script_path.parent.parent.parent)
model_filepath = os.path.join(project_path, "ml-resource", "cnn_model.h5")
json_filepath = os.path.join(project_path, "ml-resource", 'cnn_label_map.json')
img_size = 32

# 加载映射字典
with open(json_filepath, 'r', encoding='utf-8') as file:
    int_to_label = json.load(file)

# 加载模型
loaded_model = None


def load_model_init():
    global loaded_model
    if loaded_model is None:
        loaded_model = models.load_model(model_filepath)


def preprocess_image(image):
    if image.size != (img_size, img_size):
        image = cv2.resize(image, (img_size, img_size))
    _, image = cv2.threshold(image, 210, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
    image = image / 255.0
    return image


# 预测字符
def predict_character_by_path(image_path):
    image = CvCommonUtils.read_img(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    return predict_character(image)


def predict_character(image):
    load_model_init()
    processed_image = preprocess_image(image)
    processed_image = processed_image.reshape(1, img_size, img_size, 1)
    prediction = loaded_model.predict(processed_image)
    predicted_label = np.argmax(prediction, axis=1)
    return int_to_label[str(predicted_label[0])]

测试

import os
from pathlib import Path

from utils.cnn.cnn_recognition import predict_character_by_path

script_path = Path(__file__).resolve()
project_path = str(script_path.parent)

image_path = os.path.join(project_path, "ml-resource", "ml", "ml_c_word", "F_003.jpg")
predicted_character = predict_character_by_path(image_path)
print(f'Predicted character: {predicted_character}')

我是码客，我是全栈工程师，我为自己代言。

Python使用OpenCV和深度学习框架TensorFlow进行字符识别

前言

字符识别图片尺寸推荐

添加依赖

说明

安装依赖

安装确认

模型训练

工具类

数据分割

数据重塑

模型定义

字符识别

工具类

测试