深度学习姿态估计实战：基于ONNX Runtime的YOLOv8 Pose部署全解析

本文将详细介绍如何脱离YOLO官方环境，使用ONNX Runtime部署YOLOv8姿态估计模型。内容包括模型加载、图像预处理（Letterbox缩放和填充）、推理执行、输出解码（边界框和关键点处理）、非极大值抑制（NMS）以及结果可视化。文章还将讨论部署中的性能优化和常见问题。

一，引言

姿态估计是计算机视觉中的一项重要任务，旨在检测图像或视频中人体关键点的位置。YOLOv8 Pose是Ultralytics公司推出的实时姿态估计模型，它将目标检测和关键点估计结合在一个端到端的网络中。为了在各种环境中高效部署该模型，选择使用ONNX Runtime（ORT），它支持跨平台（包括CPU和GPU）推理，且不依赖于原始训练框架。

二，模型加载与初始化

在YOLOv8Pose类的初始化方法中，加载ONNX模型并配置推理会话：

class YOLOv8Pose:
    def __init__(self, model_path, conf_thres=0.1, iou_thres=0.45):
        self.conf_thres = conf_thres
        self.iou_thres = iou_thres

        # 初始化ONNX Runtime
        self.session = ort.InferenceSession(model_path)
        self.input_name = self.session.get_inputs()[0].name
        self.output_name = self.session.get_outputs()[0].name
        self.input_shape = self.session.get_inputs()[0].shape[2:]  # (h, w)

注意：

model_path：ONNX模型文件的路径。
conf_thres：置信度阈值，用于过滤低置信度的检测框。
iou_thres：NMS中的IoU阈值。
从模型输入中获取输入形状（高度和宽度），通常为640x640。

三 . 图像预处理：Letterbox缩放与填充

由于模型输入尺寸固定，而输入图像尺寸各异，我们需要将图像调整为模型输入尺寸，同时保持长宽比，以避免扭曲。这通过Letterbox算法实现：

    def preprocess(self, img):
        # 原始图像尺寸
        self.orig_h, self.orig_w = img.shape[:2]
        # 计算缩放比例（取最小比例，使长边缩放到模型输入尺寸，短边按比例缩放）
        scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)

        # 计算缩放后的新尺寸
        self.new_unpad = (int(self.orig_w * scale), int(self.orig_h * scale))
        # 计算填充（在缩放到模型尺寸后，需要在两侧添加的填充）
        self.dw = (self.input_shape[1] - self.new_unpad[0]) / 2  # 水平填充
        self.dh = (self.input_shape[0] - self.new_unpad[1]) / 2  # 垂直填充

        # 执行缩放
        if (self.new_unpad[0], self.new_unpad[1]) != (self.orig_w, self.orig_h):
            img = cv2.resize(img, self.new_unpad, interpolation=cv2.INTER_LINEAR)
        # 添加填充（上下左右）
        top, bottom = int(round(self.dh - 0.1)), int(round(self.dh + 0.1))
        left, right = int(round(self.dw - 0.1)), int(round(self.dw + 0.1))
        img = cv2.copyMakeBorder(img, top, bottom, left, right,
                                 cv2.BORDER_CONSTANT, value=(114, 114, 114))

        # 图像通道转换和归一化
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # BGR->RGB
        img = img.transpose(2, 0, 1)  # HWC->CHW
        img = np.ascontiguousarray(img, dtype=np.float32) / 255.0  # 归一化到[0,1]
        return np.expand_dims(img, axis=0)  # 添加batch维度

四，模型推理

推理过程非常简单，因为我们已经处理好了输入数据：

    # 在main函数中：
    input_tensor = model.preprocess(img)
    outputs = model.session.run([model.output_name], {model.input_name: input_tensor})

注意：我们使用session.run进行推理，传入输入数据的字典（输入名称->输入张量）和输出名称列表（这里只需要一个输出）。

五，后处理：解析模型输出

模型输出是一个形状为[1, 11, 8400]的张量（以本文模型为例），其中：

1：批大小（batch size）。
11：每个预测框的维度（4个边界框坐标+1个置信度+6个关键点坐标，因为每个关键点有3个值：x,y,score，所以两个关键点就是6个值）。
8400：预测框的数量。
后处理步骤包括：

转置输出，得到形状为[8400, 11]的矩阵。
根据置信度阈值过滤掉低置信度的预测框。
将边界框格式从(cx, cy, w, h)转换为(x1, y1, x2, y2)。
解析关键点（重塑为[N, 2, 3]，其中2是关键点的数量，每个关键点有x, y, score）。
将坐标从模型输入尺寸映射回原始图像尺寸（反转预处理中的缩放和填充）。
应用非极大值抑制（NMS）去除冗余检测框。

    def postprocess(self, outputs):
        predictions = outputs[0][0].T  # 转置为[8400, 11]

        # 1. 按置信度阈值过滤
        conf_mask = predictions[:, 4] > self.conf_thres
        predictions = predictions[conf_mask]
        if predictions.shape[0] == 0:
            return [], [], []   # 没有检测结果

        # 2. 边界框转换 (cx, cy, w, h) -> (x1, y1, x2, y2)
        boxes = predictions[:, :4].copy()
        boxes[:, 0] = boxes[:, 0] - boxes[:, 2] / 2  # x1 = cx - w/2
        boxes[:, 1] = boxes[:, 1] - boxes[:, 3] / 2  # y1 = cy - h/2
        boxes[:, 2] = boxes[:, 0] + boxes[:, 2]      # x2 = x1 + w
        boxes[:, 3] = boxes[:, 1] + boxes[:, 3]      # y2 = y1 + h

        # 3. 关键点：将6个值（2个关键点）重塑为[2, 3]
        keypoints = predictions[:, 5:].reshape(-1, 2, 3)  # [n, 2, 3]

        # 4. 坐标转换（映射回原始图像尺寸）
        # 计算缩放比例
        scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)
        # 调整边界框
        boxes[:, [0, 2]] -= self.dw  # 减去水平填充
        boxes[:, [1, 3]] -= self.dh   # 减去垂直填充
        boxes[:, :4] /= scale         # 缩放到原始图像尺寸

        # 调整关键点
        keypoints[:, :, 0] -= self.dw   # 关键点x坐标减去水平填充
        keypoints[:, :, 1] -= self.dh   # 关键点y坐标减去垂直填充
        keypoints[:, :, :2] /= scale    # 缩放到原始图像尺寸

        # 取整
        boxes = boxes.round().astype(int)
        keypoints = keypoints.round().astype(int)

        # 5. NMS
        scores = predictions[:, 4]
        indices = cv2.dnn.NMSBoxes(boxes.tolist(), scores.tolist(), self.conf_thres, self.iou_thres)

        # 注意：如果indices为空，则返回空列表；否则使用索引获取元素
        if len(indices) > 0:
            indices = indices.flatten()
            return boxes[indices], scores[indices], keypoints[indices]
        else:
            return [], [], []

注意：

在坐标转换时，我们先减去填充（dw和dh），然后除以缩放比例scale。
使用round().astype(int)将坐标转为整数。
使用OpenCV的NMSBoxes函数进行非极大值抑制，该函数返回保留框的索引。

六，结果可视化

可视化函数在图像上绘制边界框和关键点：

    def visualize(self, image, boxes, keypoints):
        # 绘制边界框
        for box in boxes:
            x1, y1, x2, y2 = box
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # 绘制关键点及连线
        for kpts in keypoints:
            # 绘制每个关键点（两个关键点，第一个为红色，第二个为蓝色）
            for i, (x, y, score) in enumerate(kpts):
                if score > 0.5:  # 关键点置信度阈值
                    color = (0, 0, 255) if i == 0 else (255, 0, 0)
                    cv2.circle(image, (x, y), 5, color, -1)

            # 连接两个关键点（如果两个关键点都置信度高）
            if len(kpts) == 2 and all(kpts[:, 2] > 0.5):
                x1, y1, _ = kpts[0]
                x2, y2, _ = kpts[1]
                cv2.line(image, (x1, y1), (x2, y2), (0, 255, 255), 2)
        return image

说明：

边界框为绿色矩形。
第一个关键点（索引0）绘制为红色点，第二个关键点（索引1）为蓝色点。
如果两个关键点的置信度都大于0.5，则在它们之间绘制一条黄色连线。

七，主函数流程

if __name__ == "__main__":
    model_path = "./runs/pose/train16/weights/best.onnx"
    image_path = "./input/test.png"

    # 读取图像
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Error: Unable to read image from {image_path}")

    # 创建模型实例
    model = YOLOv8Pose(model_path)

    # 预处理
    input_tensor = model.preprocess(img)

    # 推理
    outputs = model.session.run([model.output_name], {model.input_name: input_tensor})

    # 后处理
    boxes, scores, keypoints = model.postprocess(outputs)

    # 可视化
    result = model.visualize(img.copy(), boxes, keypoints)
    cv2.imshow("Result", result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

八，完整代码如下

import cv2
import numpy as np
import onnxruntime as ort


class YOLOv8Pose:
    def __init__(self, model_path, conf_thres=0.1, iou_thres=0.45):
        self.conf_thres = conf_thres
        self.iou_thres = iou_thres

        # 初始化ONNX Runtime
        self.session = ort.InferenceSession(model_path)
        self.input_name = self.session.get_inputs()[0].name
        self.output_name = self.session.get_outputs()[0].name
        self.input_shape = self.session.get_inputs()[0].shape[2:]  # (h, w)

    def preprocess(self, img):
        # Letterbox处理（保持宽高比）
        self.orig_h, self.orig_w = img.shape[:2]
        scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)

        # 计算新尺寸和填充
        self.new_unpad = (int(self.orig_w * scale), int(self.orig_h * scale))
        self.dw = (self.input_shape[1] - self.new_unpad[0]) / 2  # 水平填充
        self.dh = (self.input_shape[0] - self.new_unpad[1]) / 2  # 垂直填充

        # 执行缩放和填充
        if (self.new_unpad[0], self.new_unpad[1]) != (self.orig_w, self.orig_h):
            img = cv2.resize(img, self.new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(self.dh - 0.1)), int(round(self.dh + 0.1))
        left, right = int(round(self.dw - 0.1)), int(round(self.dw + 0.1))
        img = cv2.copyMakeBorder(img, top, bottom, left, right,
                                 cv2.BORDER_CONSTANT, value=(114, 114, 114))

        # 转换颜色通道和维度
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = img.transpose(2, 0, 1)  # HWC -> CHW
        img = np.ascontiguousarray(img, dtype=np.float32) / 255.0
        return np.expand_dims(img, axis=0)

    def postprocess(self, outputs):
        # 输出形状转换 [1, 11, 8400] -> [8400, 11] 
        predictions = outputs[0][0].T

        # 过滤低置信度
        conf_mask = predictions[:, 4] > self.conf_thres
        predictions = predictions[conf_mask]
        if predictions.shape[0] == 0:
            return [], [], []

        # 转换边界框坐标 (cx, cy, w, h) -> (x1, y1, x2, y2)
        boxes = predictions[:, :4].copy()
        boxes[:, 0] = (boxes[:, 0] - boxes[:, 2] / 2)  # x1
        boxes[:, 1] = (boxes[:, 1] - boxes[:, 3] / 2)  # y1
        boxes[:, 2] += boxes[:, 0]  # x2
        boxes[:, 3] += boxes[:, 1]  # y2

        # 关键点处理 (每个目标有两个关键点，每个点含x,y,score)
        keypoints = predictions[:, 5:].reshape(-1, 2, 3)  # [N, 2, 3]

        # 坐标转换到原始图像空间
        scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)

        # 调整边界框
        boxes[:, [0, 2]] -= self.dw  # 减去水平填充
        boxes[:, [1, 3]] -= self.dh  # 减去垂直填充
        boxes /= scale
        boxes = boxes.round().astype(int)

        # 调整关键点
        keypoints[:, :, 0] -= self.dw
        keypoints[:, :, 1] -= self.dh
        keypoints[:, :, :2] /= scale
        keypoints = keypoints.round().astype(int)

        # 应用NMS
        scores = predictions[:, 4]
        indices = self.nms(boxes, scores)
        return boxes[indices], scores[indices], keypoints[indices]

    def nms(self, boxes, scores):
        # OpenCV实现的高效NMS
        return cv2.dnn.NMSBoxes(
            boxes.tolist(),
            scores.tolist(),
            self.conf_thres,
            self.iou_thres
        )

    def visualize(self, image, boxes, keypoints):
        # 绘制边界框
        for box in boxes:
            x1, y1, x2, y2 = box
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # 绘制关键点及连线
        for kpts in keypoints:
            # 绘制关键点
            for i, (x, y, score) in enumerate(kpts):
                if score > 0.5:
                    color = (0, 0, 255) if i == 0 else (255, 0, 0)
                    cv2.circle(image, (x, y), 5, color, -1)

            # 绘制两个关键点之间的连线
            if len(kpts) == 2 and all(kpts[:, 2] > 0.5):
                x1, y1 = kpts[0][:2]
                x2, y2 = kpts[1][:2]
                cv2.line(image, (x1, y1), (x2, y2), (0, 255, 255), 2)
        return image



if __name__ == "__main__":
    model_path = "./runs/pose/train16/weights/best.onnx"
    image_path = "./input/test.png"

    # 读取图像
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Error: Unable to read image from {image_path}")

    # 创建YOLOv8Pose实例
    model = YOLOv8Pose(model_path)

    # 预处理
    input_tensor = model.preprocess(img)

    # 推理
    outputs = model.session.run([model.output_name], {model.input_name: input_tensor})

    # 后处理
    boxes, scores, keypoints = model.postprocess(outputs)

    # 可视化
    result = model.visualize(img.copy(), boxes, keypoints)
    cv2.imshow("Result", result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()