Score: 1

Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning

Published: September 25, 2025 | arXiv ID: 2509.20744v1

By: Qihang Ai, Haiyun Jiang

Potential Business Impact:

Makes computers solve hard problems much faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We study reasoning tasks through a framework that integrates auto-regressive (AR) and non-autoregressive (NAR) language models. AR models, which generate text sequentially, excel at producing coherent outputs but often suffer from slow inference, particularly in reasoning-intensive domains such as mathematics and code, where lengthy chains of thought are required. In contrast, NAR models, such as discrete diffusion models, allow parallel generation and offer substantial speedups, though typically at the cost of reduced output quality. To address these limitations, we introduce a new paradigm in which an NAR model efficiently produces intermediate reasoning traces, which subsequently guide an AR model to deliver precise final answers. Experiments demonstrate that our approach yields significant 26% improvements over strong baselines while substantially reducing inference cost.

SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation

Machine Learning (CS)

Makes AI think faster and better.

7 Oct 2025 2

88%

TiDAR: Think in Diffusion, Talk in Autoregression

Computation and Language

Makes computers write better and faster.

12 Nov 2025 1

88%

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Audio and Speech Processing

Makes computers talk like people faster.

14 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇸🇬 🇨🇳 Singapore, China

Page Count

4 pages

Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning

Makes computers solve hard problems much faster.

Technical Abstract

SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation

TiDAR: Think in Diffusion, Talk in Autoregression

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis