Jonas Hein*, Matthias Seibold*, Federica Bogo, Mazda Farshad, Marc Pollefeys, °Philipp Fürnstahl, °Nassir Navab

* equal first authors, ° equal last authors

TUM
ETH Zurich
Balgrist

Abstract

We present the first data generation pipelines for hand and surgical tool pose estimation in open surgery, addressing a critical gap in markerless intra-operative tracking.

Purpose: We created the first synthetic and real data generation pipelines for hand and object pose in open surgery — enabling evaluation of RGB-based pose estimation baselines.

Methods:

Results: The best baseline achieves 16.7 mm mean 3D vertex error on synthetic data and 13.8 mm on real data.

Conclusion: Synthetic pretraining followed by fine-tuning on real data significantly improves pose estimation performance for surgical tools and hands.


Code

All code is available on GitHub under jonashein:

RepositoryDescription
grasp_generatorGenerate hand grasps for arbitrary objects
grasp_rendererRender hand+object pairs to synthetic images
handobject_dataset_creatorCapture and label real hand+object data
handobjectnet_baselineHandObjectNet pose estimation baseline
pvnet_baselinePVNet pose estimation baseline
baseline_combinationCombine multiple baselines

Datasets & Checkpoints

ResourceDownload
Synthetic Datasetsyn_colibri_v1.zip
Real Datasetreal_colibri_v1.zip
Pretrained Modelspretrained_models.zip

Example Frames

Synthetic Dataset

Synthetic RGB
Synthetic Ground Truth
Synthetic Mask

Real Dataset

Real RGB
Real Ground Truth
Real Object Mask