Score: 0

Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation

Published: August 28, 2025 | arXiv ID: 2508.20830v1

By: Krit Duangprom, Tryphon Lambrou, Binod Bhattarai

Potential Business Impact:

Helps robots see and grab tiny surgical tools.

Business Areas:

Image Recognition Data and Analytics, Software

This paper presents a novel pipeline for 2D keypoint estima- tion of surgical tools by leveraging Vision Language Models (VLMs) fine- tuned using a low rank adjusting (LoRA) technique. Unlike traditional Convolutional Neural Network (CNN) or Transformer-based approaches, which often suffer from overfitting in small-scale medical datasets, our method harnesses the generalization capabilities of pre-trained VLMs. We carefully design prompts to create an instruction-tuning dataset and use them to align visual features with semantic keypoint descriptions. Experimental results show that with only two epochs of fine tuning, the adapted VLM outperforms the baseline models, demonstrating the ef- fectiveness of LoRA in low-resource scenarios. This approach not only improves keypoint detection performance, but also paves the way for future work in 3D surgical hands and tools pose estimation.

Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models

CV and Pattern Recognition

Helps doctors find diseases in CT scans faster.

29 Nov 2025 0

91%

Evaluating Large Vision-language Models for Surgical Tool Detection

CV and Pattern Recognition

AI helps surgeons find tools during operations.

23 Jan 2026 1

89%

Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models

CV and Pattern Recognition

Lets computers understand pictures even in bad light.

8 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

11 pages

Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation

Helps robots see and grab tiny surgical tools.

Technical Abstract

Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models

Evaluating Large Vision-language Models for Surgical Tool Detection

Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models