Score: 1

An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

Published: June 4, 2025 | arXiv ID: 2506.03519v2

By: Yangyang Zhao , Ben Niu , Libo Qin and more

Potential Business Impact:

Makes chatbots learn faster and smarter.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Deep Reinforcement Learning (DRL) is widely used in task-oriented dialogue systems to optimize dialogue policy, but it struggles to balance exploration and exploitation due to the high dimensionality of state and action spaces. This challenge often results in local optima or poor convergence. Evolutionary Algorithms (EAs) have been proven to effectively explore the solution space of neural networks by maintaining population diversity. Inspired by this, we innovatively combine the global search capabilities of EA with the local optimization of DRL to achieve a balance between exploration and exploitation. Nevertheless, the inherent flexibility of natural language in dialogue tasks complicates this direct integration, leading to prolonged evolutionary times. Thus, we further propose an elite individual injection mechanism to enhance EA's search efficiency by adaptively introducing best-performing individuals into the population. Experiments across four datasets show that our approach significantly improves the balance between exploration and exploitation, boosting performance. Moreover, the effectiveness of the EII mechanism in reducing exploration time has been demonstrated, achieving an efficient integration of EA and DRL on task-oriented dialogue policy tasks.

Evolutionary Policy Optimization

Machine Learning (CS)

Teaches robots to learn faster and better.

24 Mar 2025 1

90%

Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation

Robotics

Robots learn many skills by copying and improving.

24 Mar 2025 0

88%

Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization

Machine Learning (CS)

Solves hard problems faster by combining learning and evolution.

11 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇳🇱 China, Netherlands

Page Count

14 pages

An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

Makes chatbots learn faster and smarter.

Technical Abstract

Evolutionary Policy Optimization

Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation

Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization