Jonas Hein*, Matthias Seibold*, Federica Bogo, Mazda Farshad, Marc Pollefeys, °Philipp Fürnstahl, °Nassir Navab
* equal first authors, ° equal last authors



Abstract
We present the first data generation pipelines for hand and surgical tool pose estimation in open surgery, addressing a critical gap in markerless intra-operative tracking.
Purpose: We created the first synthetic and real data generation pipelines for hand and object pose in open surgery — enabling evaluation of RGB-based pose estimation baselines.
Methods:
- A rendering pipeline for realistic synthetic training data (used for pretraining networks inexpensively)
- A real data capture pipeline for labeling real images with ground-truth pose in an experimental setup
- Three RGB-based pose estimation baselines evaluated on both datasets
Results: The best baseline achieves 16.7 mm mean 3D vertex error on synthetic data and 13.8 mm on real data.
Conclusion: Synthetic pretraining followed by fine-tuning on real data significantly improves pose estimation performance for surgical tools and hands.
Code
All code is available on GitHub under jonashein:
| Repository | Description |
|---|---|
| grasp_generator | Generate hand grasps for arbitrary objects |
| grasp_renderer | Render hand+object pairs to synthetic images |
| handobject_dataset_creator | Capture and label real hand+object data |
| handobjectnet_baseline | HandObjectNet pose estimation baseline |
| pvnet_baseline | PVNet pose estimation baseline |
| baseline_combination | Combine multiple baselines |
Datasets & Checkpoints
| Resource | Download |
|---|---|
| Synthetic Dataset | syn_colibri_v1.zip |
| Real Dataset | real_colibri_v1.zip |
| Pretrained Models | pretrained_models.zip |
Example Frames
Synthetic Dataset



Real Dataset


