Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation
By: Jialong Mai , Xiaofen Xing , Yawei Li and more
Potential Business Impact:
Lets computers understand feelings in long talks.
Recent research has focused on applying speech large language model (SLLM) to improve speech emotion recognition (SER). However, the inherently high frame rate in speech modality severely limits the signal processing and understanding capabilities of SLLM. For example, a SLLM with a 4K context window can only process 80 seconds of audio at 50Hz feature sampling rate before reaching its capacity limit. Input token compression methods used in SLLM overlook the continuity and inertia of emotions across multiple conversation turns. This paper proposes a Dynamic Parameter Memory (DPM) mechanism with contextual semantics and sentence-level emotion encoding, enabling processing of unlimited-length audio with limited context windows in SLLM. Specifically, DPM progressively encodes sentence-level information and emotions into a temporary LoRA module during inference to effectively "memorize" the contextual information. We trained an emotion SLLM as a backbone and incorporated our DPM into inference for emotion recognition in conversation (ERC). Experimental results on the IEMOCAP dataset show that DPM significantly improves the emotion recognition capabilities of SLLM when processing long audio sequences, achieving state-of-the-art performance.
Similar Papers
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Audio and Speech Processing
Helps computers understand your feelings from your voice.
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Machine Learning (CS)
Lets small phones remember and see things.
Dynamic Long Short-Term Memory Based Memory Storage For Long Horizon LLM Interaction
Computation and Language
Helps computers remember what you like.