Score: 0

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Published: August 11, 2025 | arXiv ID: 2508.07540v1

By: Junuk Cha, Jihyeon Kim

Potential Business Impact:

Makes computers create 3D human movements from simple words.

Recent advances in multi-modal large language models (MLLMs) and chain-of-thought (CoT) reasoning have led to significant progress in image and text generation tasks. However, the field of 3D human pose generation still faces critical limitations. Most existing text-to-pose models rely heavily on detailed (low-level) prompts that explicitly describe joint configurations. In contrast, humans tend to communicate actions and intentions using abstract (high-level) language. This mismatch results in a practical challenge for deploying pose generation systems in real-world scenarios. To bridge this gap, we introduce a novel framework that incorporates CoT reasoning into the pose generation process, enabling the interpretation of abstract prompts into accurate 3D human poses. We further propose a data synthesis pipeline that automatically generates triplets of abstract prompts, detailed prompts, and corresponding 3D poses for training process. Experimental results demonstrate that our reasoning-enhanced model, CoT-Pose, can effectively generate plausible and semantically aligned poses from abstract textual inputs. This work highlights the importance of high-level understanding in pose generation and opens new directions for reasoning-enhanced approach for human pose generation.

Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

CV and Pattern Recognition

Helps computers understand 3D scenes like people.

19 Oct 2025 0

91%

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

Computation and Language

Helps AI "think step-by-step" to solve harder problems.

17 Nov 2025 0

91%

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

Computation and Language

Helps AI "think" step-by-step to solve harder problems.

17 Nov 2025 0

View PDF Login to Bookmark

Page Count

9 pages

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Makes computers create 3D human movements from simple words.

Technical Abstract

Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models