Score: 1

FocalOrder: Focal Preference Optimization for Reading Order Detection

Published: January 12, 2026 | arXiv ID: 2601.07483v1

By: Fuyuan Liu , Dianyu Yu , He Ren and more

Potential Business Impact:

Helps computers understand tricky document layouts better.

Business Areas:

Image Recognition Data and Analytics, Software

Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout regions. In this work, we challenge this assumption by revealing a critical flaw: \textbf{Positional Disparity}, a phenomenon where models demonstrate mastery over the deterministic start and end regions but suffer a performance collapse in the complex intermediate sections. This degradation arises because standard training allows the massive volume of easy patterns to drown out the learning signals from difficult layouts. To address this, we propose \textbf{FocalOrder}, a framework driven by \textbf{Focal Preference Optimization (FPO)}. Specifically, FocalOrder employs adaptive difficulty discovery with exponential moving average mechanism to dynamically pinpoint hard-to-learn transitions, while introducing a difficulty-calibrated pairwise ranking objective to enforce global logical consistency. Extensive experiments demonstrate that FocalOrder establishes new state-of-the-art results on OmniDocBench v1.0 and Comp-HRDoc. Our compact model not only outperforms competitive specialized baselines but also significantly surpasses large-scale general VLMs. These results demonstrate that aligning the optimization with intrinsic structural ambiguity of documents is critical for mastering complex document structures.

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Computation and Language

Teaches AI to learn human choices better.

11 Jan 2025 2

86%

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

CV and Pattern Recognition

Makes AI videos look more real and flow better.

7 Jan 2026 0

86%

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

CV and Pattern Recognition

Makes AI videos look more real and flow better.

7 Jan 2026 0

View PDF Login to Bookmark

Page Count

17 pages

FocalOrder: Focal Preference Optimization for Reading Order Detection

Helps computers understand tricky document layouts better.

Technical Abstract

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models