
arXiv-2020
文章目录
- 1 Background and Motivation
- 2 Advantages / Contributions
- 3 Method
- 4 Experiments
- 5 Conclusion(own)
1 Background and Motivation
人体关键点存在的难点:a wide variety of poses, numerous degrees of freedom, and occlusions.
本位没有聚焦解决上述难点,而是从快的方面入手,提速
2 Advantages / Contributions
- a novel body pose tracking solution
- a lightweight body pose estimation neural network
3 Method
整体预测流程如下,涉及到了跟踪和关键点检测
 
The tracker predicts
- key-point coordinates
- the presence of the person on the current frame
- the refined region of interest for the current frame
When the tracker indicates that there is no human present, we re-run the detector network on the next frame.
注意,没有用人体检测器去检测人,而是采用了 face detector,先找 RoI,人脸,臀部中点,肩膀中点,臀部中点与肩膀中点的夹角,然后可以使其平行于竖直方向,来对齐

 图片来源 简单几行代码玩转实时人体姿态追踪算法BlazePose
像达芬奇的《维特鲁威人》这样,这样对齐后也会有利于跟踪
 
 会预测出 33 个关键点
 
 每个关键点对应的类别如下
-  Nose 
-  Left eye inner(眼睛内侧) 
-  Left eye 
-  Left eye outer(眼睛外侧) 
-  Right eye inner 
-  Right eye 
-  Right eye outer 
-  Left ear 
-  Right ear 
-  Mouth left 
-  Mouth right 
-  Left shoulder 
-  Right shoulder 
-  Left elbow 
-  Right elbow 
-  Left wrist 
-  Right wrist 
-  Left pinky #1 knuckle(小拇指) 
-  Right pinky #1 knuckle 
-  Left index #1 knuckle(食指) 
-  Right index #1 knuckle 
-  Left thumb #2 knuckle(拇指) 
-  Right thumb #2 knuckle 
-  Left hip 
-  Right hip 
-  Left knee 
-  Right knee 
-  Left ankle 
-  Right ankle 
-  Left heel(脚跟) 
-  Right heel 
-  Left foot index 
-  Right foot index 
关键点预测模型结构如下

 既有热力图预测关键点(准),又有回归预测关键点(快)
训练时两者都采用,共享了部分特征图,梯度没有共享(the gradients from the regression encoder are not propagated back to the heatmaptrained features),梯度不共享的好处:not only improve the heatmap predictions, but also substantially increase the coordinate regression accuracy
推理时,仅保留回归分支
4 Experiments
数据集
- AR Dataset
- Yoga Dataset
训练时
10% scale and shift augmentations,有利于跟踪
simulate occlusions (random rectangles filled with various colors),每个关键点都有是否可见或者准确的概率
测试,在 COCO 17 个关键点上进行,结果如下

 评价指标 the Percent of Correct Points with 20% tolerance (PCK@0.2) (where we assume the point to be detected correctly if the 2D Euclidean error is smaller than 20% of the corresponding person’s torso size
效果展示


5 Conclusion(own)
https://github.com/google/mediapipe
Pose 是 3D 的
 


















