2D Spine Pose Estimation

sellWorkshop Paper sellCVPRW sellPose Estimation sellSpine Tracking sellBiomechanics

Towards Unconstrained 2D Pose Estimation of the Human Spine

Abstract

We introduce SpineTrack, the first comprehensive dataset dedicated to 2D spine pose estimation in unconstrained environments, addressing a critical gap in human pose analysis for sports and biomechanical applications. Existing pose datasets typically represent the spine with a single rigid segment, neglecting the detailed articulation required for precise analysis. To overcome this limitation, SpineTrack comprises two complementary components: SpineTrack-Real, a real-world dataset with high-fidelity spine annotations refined via an active learning pipeline, and SpineTrack-Unreal, a synthetic dataset generated using an Unreal Engine-based framework with accurate ground-truth labels. Additionally, we propose a novel biomechanical validation framework based on OpenSim to enforce anatomical consistency in the annotated keypoints. Complementing the dataset, our SpinePose model extends state-of-the-art body pose estimation networks through a teacher–student distillation approach and an anatomical regularization strategy, effectively incorporating detailed spine keypoints without sacrificing overall performance. Extensive experiments on standard benchmarks and sports-specific scenarios demonstrate that our approach significantly improves spine tracking accuracy while maintaining robust generalization.

Conference PyPI version PyPI - License


Demo videos are sourced from Pexels.com.

Inference

Recommended Python Version: 3.9–3.12

For quick spinal keypoint estimation, we release optimized ONNX models via the spinepose package on PyPI:

pip install spinepose

On Linux/Windows with CUDA available, install the GPU version:

pip install spinepose[gpu]

Using the CLI

usage: spinepose [-h] (--version | --input_path INPUT_PATH) [--vis-path VIS_PATH] [--save-path SAVE_PATH] [--mode {xlarge,large,medium,small}] [--nosmooth] [--spine-only]

SpinePose Inference

options:
  -h, --help            show this help message and exit
  --version, -V         Print the version and exit.
  --input_path INPUT_PATH, -i INPUT_PATH
                        Path to the input image or video
  --vis-path VIS_PATH, -o VIS_PATH
                        Path to save the output image or video
  --save-path SAVE_PATH, -s SAVE_PATH
                        Save predictions in OpenPose format (.json for image or folder for video).
  --mode {xlarge,large,medium,small}, -m {xlarge,large,medium,small}
                        Model size. Choose from: xlarge, large, medium, small (default: medium)
  --nosmooth            Disable keypoint smoothing for video inference (default: enabled)
  --spine-only          Only use 9 spine keypoints (default: use all 37 keypoints)

For example, to run inference on a video and save only spine keypoints in OpenPose format:

spinepose --input_path path/to/video.mp4 --save-path output_path.json --spine-only

This automatically downloads the model weights (if not already present) and outputs the annotated image or video. Use spinepose -h to view all available options, including GPU usage and confidence thresholds.

Using the Python API

import cv2
from spinepose import SpinePoseEstimator

# Initialize estimator (downloads ONNX model if not found locally)
estimator = SpinePoseEstimator(device='cuda')

# Perform inference on a single image
image = cv2.imread('path/to/image.jpg')
keypoints, scores = estimator(image)
visualized = estimator.visualize(image, keypoints, scores)
cv2.imwrite('output.jpg', visualized)

Or, for a simplified interface:

from spinepose.inference import infer_image, infer_video

# Single image inference
results = infer_image('path/to/image.jpg', vis_path='output.jpg')

# Video inference with optional temporal smoothing
results = infer_video('path/to/video.mp4', vis_path='output_video.mp4', use_smoothing=True)

SpineTrack Dataset

SpineTrack is available on HuggingFace. The dataset comprises:

  • SpineTrack-Real A collection of real-world images annotated with nine spinal keypoints in addition to standard body joints. An active learning pipeline, combining pretrained neural annotators and human corrections, refines keypoints across diverse poses.

  • SpineTrack-Unreal A synthetic subset rendered using Unreal Engine, paired with precise ground-truth from a biomechanically aligned OpenSim model. These synthetic images facilitate pretraining and complement real-world data.

To download:

git lfs install
git clone https://huggingface.co/datasets/saifkhichi96/spinetrack

Alternatively, use wget to download the dataset directly:

wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/annotations.zip
wget https://huggingface.co/datasets/saifkhichi96/spinetrack/resolve/main/images.zip

In both cases, the dataset will download two zipped folders: annotations (24.8 MB) and images (19.4 GB), which can be unzipped to obtain the following structure:

spinetrack
├── annotations/
│   ├── person_keypoints_train-real-coco.json
│   ├── person_keypoints_train-real-yoga.json
│   ├── person_keypoints_train-unreal.json
│   └── person_keypoints_val2017.json
└── images/
    ├── train-real-coco/
    ├── train-real-yoga/
    ├── train-unreal/
    └── val2017/

All annotations are in COCO format and can be used with standard pose estimation libraries.

Results and Evaluation

We benchmark SpinePose against state-of-the-art lightweight pose estimation methods on COCO, Halpe, and our SpineTrack dataset. The results are summarized below, with SpinePose models highlighted in gray. Only 26 body keypoints are used for Halpe evaluations.

Method Train Data Kpts COCO Halpe26 Body Feet Spine Overall Params (M) FLOPs (G)
APAR APAR APAR APAR APAR APAR
SimCC-MBV2COCO1762.067.833.243.972.175.60.00.00.00.00.10.12.290.31
RTMPose-tBody82665.971.368.073.276.980.074.179.70.00.015.817.93.510.37
RTMPose-sBody82669.774.772.076.780.983.678.983.50.00.017.219.45.700.70
SpinePose-sSpineTrack3768.273.170.675.279.182.177.582.989.690.784.286.25.980.72
SimCC-ViPNASCOCO1769.575.536.949.779.683.00.00.00.00.00.20.28.650.80
RTMPose-mBody82675.180.076.781.385.587.984.188.20.00.019.421.413.931.95
SpinePose-mSpineTrack3773.077.575.079.284.086.483.587.491.492.588.089.514.341.98
RTMPose-lBody82676.981.578.482.986.889.286.990.00.00.020.022.028.114.19
RTMW-mCocktail1413373.878.763.868.584.386.783.087.20.00.06.27.632.264.31
SimCC-ResNet50COCO1772.178.238.751.681.885.20.00.00.00.00.20.236.755.50
SpinePose-lSpineTrack3775.279.577.081.185.487.785.589.291.092.288.490.028.664.22
SimCC-ResNet50*COCO1773.479.039.852.483.286.20.00.00.00.00.30.343.2912.42
RTMPose-x*Body82678.883.480.084.488.690.688.491.40.00.021.022.950.0017.29
RTMW-l*Cocktail1413375.680.465.470.186.088.385.689.20.00.08.18.157.207.91
RTMW-l*Cocktail1413377.282.366.671.887.389.988.391.30.00.08.68.657.3517.69
SpinePose-x*SpineTrack3775.980.177.681.886.388.586.389.789.391.088.989.950.6917.37

For evaluation instructions and to reproduce the results reported in our paper, please refer to the evaluation branch of this repository:

git clone https://github.com/dfki-av/spinepose.git
cd spinepose
git checkout evaluation

The README in the evaluation branch provides detailed steps for setting up the evaluation environment and running the evaluation scripts on the SpineTrack dataset.


Citation

If this project or dataset proves helpful in your work, please cite:

@InProceedings{Khan_2025_CVPR,
    author    = {Khan, Muhammad Saif Ullah and Krau{\ss}, Stephan and Stricker, Didier},
    title     = {Towards Unconstrained 2D Pose Estimation of the Human Spine},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6171-6180}
}

License

This project is released under the CC-BY-NC-4.0 License. Commercial use is prohibited, and appropriate attribution is required for research or educational applications.