Score: 0

PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology

Published: December 19, 2025 | arXiv ID: 2512.17621v1

By: Fengchun Liu , Songhan Jiang , Linghan Cai and more

While Vision-Language Models (VLMs) have achieved notable progress in computational pathology (CPath), the gigapixel scale and spatial heterogeneity of Whole Slide Images (WSIs) continue to pose challenges for multimodal understanding. Existing alignment methods struggle to capture fine-grained correspondences between textual descriptions and visual cues across thousands of patches from a slide, compromising their performance on downstream tasks. In this paper, we propose PathFLIP (Pathology Fine-grained Language-Image Pretraining), a novel framework for holistic WSI interpretation. PathFLIP decomposes slide-level captions into region-level subcaptions and generates text-conditioned region embeddings to facilitate precise visual-language grounding. By harnessing Large Language Models (LLMs), PathFLIP can seamlessly follow diverse clinical instructions and adapt to varied diagnostic contexts. Furthermore, it exhibits versatile capabilities across multiple paradigms, efficiently handling slide-level classification and retrieval, fine-grained lesion localization, and instruction following. Extensive experiments demonstrate that PathFLIP outperforms existing large-scale pathological VLMs on four representative benchmarks while requiring significantly less training data, paving the way for fine-grained, instruction-aware WSI interpretation in clinical practice.

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

CV and Pattern Recognition

Helps doctors see cancer details better.

26 Apr 2025 2

91%

LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

CV and Pattern Recognition

Helps doctors find diseases on slides faster.

5 Dec 2025 1

90%

Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology

CV and Pattern Recognition

Helps doctors find diseases with fewer patient pictures.

21 Mar 2025 1

View PDF Login to Bookmark

PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology

Technical Abstract

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology